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uncertainty quantification in engineering, written by topic experts who presented 
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and Monte Carlo methods, followed by three chapters on imprecise probabilities: 
introductions to the general theory and to imprecise Markov chains, and a short 
overview of statistical methods. Then attention shifts to reliability theory and simula- 
tion methods for complex systems. The final two chapters are on aspects of aerospace 
engineering, considering stochastic model updating from an imprecise Bayesian 
perspective, and uncertainty quantification for aerospace flight modelling. 
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School was also organised within the UTOPIAE ITN, supported by H2020-MSCA- 
ITN-2016 UTOPIAE, grant agreement 722734. 
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Chapter 1 A) 
Introduction to Bayesian Statistical get 
Inference 


Georgios P. Karagiannis 


Abstract We present basic concepts of Bayesian statistical inference. We briefly 
introduce the Bayesian paradigm. We present the conjugate priors; a computational 
convenient way to quantify prior information for tractable Bayesian statistical anal- 
ysis. We present tools for parametric and predictive inference, and particularly the 
design of point estimators, credible sets, and hypothesis tests. These concepts are 
presented in running examples. Supplementary material is available from GitHub. 


1.1 Introduction 


Statistics mainly aim at addressing two major things. First, we wish to learn or 
draw conclusions about an unknown quantity, 0 € © called ‘the parameter’, which 
cannot be directly measured or observed, by measuring or observing a sequence of 
other quantities called ‘observations (or data, or samples)’ Xi:n := (X1, ..-,; Xn) E X” 
whose generating mechanism is (or can be considered as) stochastically dependent 
on the quantity of interest 0 though a probabilistic model x1:n ~ f (10). This is an 
inverse problem since we wish to study the cause 6 by knowing its effect x1:n. We will 
refer to this as parametric inference. Second, we wish to learn the possible values of 
a future sequence of observations Yi:m € X” given xı:n. This is a forward problem, 
and we will call it predictive inference. Here, we present how both inferences can be 
addressed in the Bayesian paradigm. ! 

Consider a sequence of observables x:n := (x1, . . . , Xn) generated from a sam- 
pling distribution f(-|@) labeled by the unknown parameter 0 € ©. The statistical 
model m consists of the observations x1:n, and their sampling distribution f(-|@) ; 
m= (f(|@); @ € ©). 


‘https://github.com/georgios-stats/UTOPIAE-Bayes. 
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Unlike in Frequentist statistics, in Bayesian statistics unknown/uncertain parame- 
ters are treated as random quantities and hence follow probability distributions. This 
is justified by adopting the subjective interpretation of probability [4], as the degree 
of the researcher’s believe about the uncertain parameter 0. Central to the Bayesian 
paradigm is the specification of the so-called prior distributions dz (0) on the uncer- 
tain parameters 0 representing the degree of believe (or state of uncertainty) of the 
researcher about the parameter. Different researchers may specify different prior 
probabilities, as this is in accordance to the subjective nature of the probability. The 
specification of the prior is discussed in Sect. 1.2. 

The Bayesian model consists of the statistical model f(x1.,|9) containing the 
information about 6 available from the observed data x1:n, and the prior distribution 
x (0) reflecting the researcher’s believe about 6 before the data collection. Itis denoted 
as 


Xinle ~ FCA) 


(f inl), 7(@)) or as 6 ~al) 


Bayesian parametric inference relies on the posterior distribution z (0 |x1:n) whose 
density or mass function (PDF or PMF) is calculated by using the Bayes theorem 


Ff (X1:n|9)1 0) 
Jo f inl) (d0) 


1 (8|X1:n) = (1.1) 


as a tool to invert the conditioning from x1:n|0 to @|x.,. Posterior distribution (1.1) 
quantifies the researcher’s degree of believe after taking into account the observations. 
By using subjective probability arguments, we can see interpret (1.1) as a mechanism 
that updates the researcher’s degree of believe from the prior 7 (0) to the posterior 
x (O |X1:n) in the light of the observations collected. 

Bayesian predictive inference about a future observation y, can be addressed 
based on the predictive distribution defined as 


P(y/Xin) = Í Fy) (dO | X11) = Ex (f Ol0)|Xin). (1.2) 


Essentially, it is the expected value of the sampling distribution averaging out 
the uncertain parameter 9 with respect to its posterior distribution reflecting the 
researcher’s degree of believe. 

Although the posterior and predictive distributions quantify the researcher’s 
knowledge, they are not enough to give a solid answer about the quantity to be 
learned. In what follows we discuss important concepts based on decision theory 
which are used for Bayesian inference. 
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1.2 Specification of the Prior 


Prior distribution x (0) needs to reflect the researcher’s degree of believe about the 
uncertain parameter 0 € ©. Sophisticated prior distributions often lead to ineluctable 
posterior or predictive probabilities, and hence Bayesian analysis. Following, we 
present a computationally convenient class of priors applicable to several scenarios. 


1.2.1 Conjugate Priors 


Conjugate priors is a mathematically convenient way to specify the prior model in 
certain cases. They facilitate the tractable implementation of the Bayesian statistical 
analysis, by leading to computationally tractable posterior distributions. 

Formally, if F = {f (10); V0 € ©} is a class of parametric models (sampling 
distributions), and P = {x (|t); Yt} is a class of prior distributions for 0, then the 
class P is conjugate for F if 


U(O|Xin) EP, VECO) EF andz(-) EP. 


It is straightforward to specify a conjugate prior when the sampling distribution 
is member of the exponential family. Consider observation x; generated from a sam- 
pling distribution in the exponential family 


x0 ~ Efe(u,2,h,¢,0,0); i=l,...,n 


with density Ef, (x|u, g, h, @,0,c)= u(x)g(0) exp} cj OOO 0-1 hj(x))) 
and g(@) = 1/f u(x) exp( j= CjQj OOD h;(x)))dx. The likelihood function 
is equal to 


n k n 
f&n) = | [u O A OA hE). (1.3) 


i=l j=l i=l 
The conjugate prior, corresponding to likelihood (1.3), admits density of the form 
k 


n(6|t) = ray” exp() cj; ()t)) (1.4) 
j=l 


where T = (To,..., Tg) is such that K(t) = Jo g(0)™ exp} cjPj(O)t)dO < 
oo. The resulting posterior of 0 has the form 
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1 To £ * 
(O\xin, T) = ——8(6)" exp() cjo; (0)t})) 
K(t*) = 
with t* = (TÈ, T],.--, TÈ), To = To +n, and t} = re Aji) + tj for j = 1, 


Lyk, 
It is easy to see that (1.4) is conjugate to (1.3) as the posterior can be re-written 
as 7 (0|X1:n, T) = 7 (0|T*) where T* = T + tn (Xin), and ty (Xi:n) = (n, Xi hy (xi), 
.-, Xi] Ak(x;)). You can check the demo in.? 


Example: Bernoulli model (Cont.) 


Consider observations x1:, = (%1,..-,X,) € R” generated from a Bernoulli distri- 
bution with success rate 0 € [0, 1]; i.e., x;|9 ~ Br(@), i = 1,..., 7. Interest lies in 
specifying a conjugate prior for 0. 

The sampling distribution is member of the exponential family, with u(x) = 1, 
g(0)= (1 — 0), c = 1, ġı (0) = log(75), hı (x) = x, because 


0 
1—90 


fŒ10) = Br(x|@) = 6* (1 — 6)'* = (1 — 0) exp(log( )x). 


The corresponding conjugate prior has PDF such as 


0 
1—90 


x (|t) x (1 — 0)” exp(log( Jr) =o “eee, 
where we recognize Beta distribution 7(6|t) = Be(@|a, b), with a = tı + 1, b = 
To — T + 1. Therefore, the posterior distribution is 


T(OlXin, T) = x 0lTo +n, T +$ AG) x OHD- q — g) mui- 


i=1 


which is Be(0|a*, b*), with a* = a + nx, and b* = b + n — nx. 


1.3 Point Estimation 


Often interest lies in learning the ‘true’ value of the unknown parameter 0 € ©, or the 
future values of a future sequence of observations yj... € X”; this is performed via 
the Bayesian point estimator. Here, we demonstrate the theory of the Bayesian point 
estimator in parametric inference, and leave the extension to the predictive inference 
to the reader. 


2 Web-applet: https://georgios-stats- 1.shinyapps.io/demo_conjugatepriors/. 
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Bayes (parametric) point estimator of 6 € © with respect to the loss function 
£(6, 5) and the posterior distribution 7 (8 |x1:n) is an Bayes rule 6” which minimizes 
Jo LO, 8) 7 (dO |x1.n)3 i.e., 


8” in) = arg min Ey (€(0, 8)|X1:n) = arg min f L, 8T (d0 xin). Q.5) 
vôcO v5e0 Jo 


Often the accuracy of the Bayes point estimator is represented by its standard error. 
A commonly accepted metric for the standard error of the j-th dimension of the 


estimator ô” is 
Sez (6; IX1:n) =y MSE, (6; [X1:n) 


where MSE, (ô; |X1:n) = [Ex ((@ — 8)(@ — 8)" |x1:n)];,; is the mean squared error of 
ôj. 

A number of standard Bayesian point estimates, under different loss functions, 
are location summary statistics of the posterior distribution (mean, median, mode, 
quantiles, etc.) You can check the demo in 

The Bayesian estimate of © with respect to the linear loss £(0, 8) = cı (ê — 
A) 1o<s(6) + c2(6 — 5) 1yg<sje (ô) is the ae -th posterior quantile; i.e., 7(9 € (—o, 
Ô (Xin DX:n) = ae The linear loss function essentially allows the adjustment of 
the penalty between over-estimating and under-estimating 0, by adjusting cjand c2. 
In particular, for cı = c2, we get the absolute loss £(@, 5) = |0 — ô| and the posterior 


estimator is the posterior median 


b(X1:n) = median, (8 |x1:n). (1.6) 


The absolute loss is more appropriate when over-estimation and under-estimation 
are of the same concern (as penalized the same). 
The Bayes estimate ô” (x1:n) of @ with respect to the quadratic loss function 
£(0, 8) = (0 — 8)? is 
5” (Xin) = Ex (0|Xi:n). (1.7) 


The posterior mean of 0 as an estimator of 0 essentially minimizes the estimator error 
Sez (Ô|X1:n), which is equal to the posterior standard error. Obviously, the standard 
error of the estimator (1.7) is equal to the posterior standard error. Compared to the 
absolute loss, the quadratic loss aims at over-penalizing large but unlikely errors. In 
fact, quadratic loss aims at minimizing the standard error se, (6|X1:n). 

Finally, the Bayesian estimate of 6 with respect to the zero-one loss £(0, 5) = 
1 — 12,(s)(@) is the posterior mode 


5(X1:n) = mode, (O|x1:n) (1.8) 


ase > 0. 


3 Web-applet: https://georgios-stats- | .shinyapps.io/demo_PointEstimation/. 
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Example: Bernoulli model (Cont.) 


Interest lies in calculating the Bayesian point estimator under the absolute loss 
function. This is the Maximum A posteriori Estimator (the posterior mode). It is 


log(z (@|x1n)) x (n® + a — 1) log(6) + (n — nx +b — 1) log(1 — 9). 


For a > 0, b > 0, 4 log(2(@|x1:n))|p=sx) = 0 implies 5(x) = JHL, Note that 
(a.) if a —> 1, b > 1 (aka w(Ola, b) « 1), then ô” (x) = x similar to frequentists 


stats; (b.) if a > 0, b > 0 (aka z (8ļa, b) x 6-'(1 — 6)~4), then 6(x) = a if 


a —> 1/2,b > 1/2 (aka x (8ļa, b) œ 07"2(1 — 6)-"/), then 8(x) = ==; ifn > 
00, a > 0, b > 0, then 6(x) = x. 


1.4 Credible Sets 


Instead of just reporting parametric (or predictive) point estimates for 0 (or Y1:m), it 
is often desirable and more useful to report a subset of values Ca C © (or Ca © X”) 
where the posterior (or predictive) probability that 9 € Ca (Or Yi:m € Ca) is equal to 
a certain value a reflecting one’s degree of believe. 

The definition below describes the credible set [1, 5]. 


Definition 1.1 (Posterior Credible Set) A set Ca C © such that 


(0 € Calin) =f PC ee ee er 


Ca 


is called ‘100(1 — a)%’ posterior credible set for 0, with respect to the posterior 
distribution z (d8 |x1:n). 


In contrast to the frequentist stats, in Bayesian stats we can speak meaningfully of 
the probability that 6 is in C,, because probability 1 — a reflects one’s degree of 
believe that 0 € Cy. 

Among all the credible sets C, in Definition 1.1, we are often interested in those 
that have the minimum volume. It can be proved [2] that the highest probability 
density (HPD) sets have this property. HPD consider those values of 0 corresponding 
to the highest posterior pdf/pmf (aka the most likely values of 0). 


Definition 1.2 (Posterior highest probability density (HPD) set) The 100(1 — a)% 
highest probability density set for 6 € © with respect to the posterior distribution 
It (@|x-,) is the subset C, of © of the form 

Ca = {0 cO: m (8|X1:n) = ka} (1.9) 


where ka is the largest constant such that 
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m (0 € Calxt-n) > 1—a. (1.10) 


From the decision theory perspective, HPD set C, is the Bayes estimate of C, the 
credible interval under the loss function £(C4, 0) = k|Ca| — 1c,(0), fork > 0 which 
penalizes sets with larger volumes. The proof is available in [2]. 


Example: Multivariate Normal model 


Consider observations x1, . . . , Xn independently drawn from a q-dimensional normal 
N,(u, £) with unknown pw € R4, q > 1, and known &, uo, Xp. Assume prior y ~ 
N, (Ho, Xo). Interest lies in calculating the C, parametric HPD credible interval for 


u. 
The posterior PDF of u is 


m(H1X1n) X f Arn OTW = | [Ng Gaile, DN; ulno, Zo) 


i=1 
1 a a A Po 
xX exp(— z(u = Pay = Ze = fln)) X Ng (utlån, Èn) 


where E, = (n=! + EFH, and fin = Ê, nET! + Dp! yo). So ulxin ~ Ng 


(fin, Ên). 
From Definition 1.2, the credible set has the form 


Ca = {u € RI : m (U|Xin) > ka} 
= {u ER? : (M — fin) SF! (u — fin) < —log(2m det(È,)))ka = ka} 


where ka is the greatest value satisfying 
TEN, (jin, $n) H E Calxin) = 1 = 4 4> 


my C(u — fin)” È, (u — Ân) < ka) = 1-a. (1.11) 


Here, (u — MALD "Cu — fin) ~ X as a sum of squares of independent standard 
normal random variables, and hence ka is the 1 — a-th quantile of the x A distribution; 


i.e., ka =x i Therefore, Ca parametric HPD credible set for ju is 


Ca = {u E RY: (M — fin)” Dy (u — Àn) < x2 1g} 
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Find the 0.9 HPD interval Find the 0.9 HPD interval 
i=) 
ised 
o] — Interval mass..: 0.9015 = | —— Interval mass..: 0.9042 
i — Bounbaries — Bounbaries 
> TEAR T 2 T Ik : 0.0032 
w | o 
= N 
O 
öğ 
2 2 
D G 
= © c 
o] 
a a 
Oo 
3 
lo Oo 
2 4 
o | s 
S ° 
T T T T T T o if T T T T T T 
0.0 0.2 0.4 0.6 0.8 1.0 -60 -40 -20 0 20 40 60 
values values 
(a) Uni-modal case (b) Multi-modal case 


Fig. 1.1 Schematic of 1D HPD set 


In real applications, the calculation of the credible interval might be intractable, 
due to the inversion in (1.9) or integration in (1.10). Below, we present a Naive 
algorithm [1] that can be implemented in a computer.* 


e Create a routine which computes all solutions 6* to the equation 7 (6|x1:n) = ka, 
for a given ka. Typically, Ca = {0 € © : z (0|x1:n) => ka} can be constructed from 
those solutions. 

e Create a routine which computes x (0 € CalXi:n) = Joec; z (0 |xXi:n)d0 

e Numerically solve the equation x (6 € CalX1:n) = 1 — a as ka varies. 


Figure 1.1 demonstrates the above procedure in 1D unimodal and tri-modal cases. 
Specifically, the red horizontal bar denotes k, moves upwards, and intersects the 
density at locations which are the potential boundaries of C,. The bar stops to move 
when the total density above regions of the parametric space is equal to 1 — a. The 
HPD credible set results as the union of these sub-regions. You can check the demo 
in? 

Theorem 1.1 suggests a computationally convenient way to calculate HPD cred- 
ible intervals in 1D, and unimodal cases. The proof is available in [3]. 


Theorem 1.1 Let 6 follows a distribution with unimodal density 7 (0|x1:n). If the 
interval C, = [L, U] satisfies 


1. f? x@|x1n)d0 =1-— a, 
2. m(U) = 2(L) > 0, and 
3. Omode E€ (L, U), where Omode is the mode of 7 (8 |x1:n), 


then interval Ca = [L, U] is the HPD interval of 0 with respect to 1(0|X1:n). 


4 Web-applet: https://georgios-stats- 1.shinyapps.io/demo_crediblesets/. 
5 Web-applet: https://georgios-stats- 1.shinyapps.io/demo_crediblesets/. 
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Example: Bernoulli model (Cont.) 


Interest lies in calculating the 2-sides 95% HPD interval for 6, given a sample with 
n = 30, and eS x; = 15, and prior hyper-parameters a = b = 2. 

The posterior distribution of 6 is Be(a + nx = 17, b + n — nx = 17), which is 
1D and unimodal; hence we use Theorem 1.1. It is 


U 
l-a= f Be(6|17, 17)d@ = Be(0 < U|17, 17) — Be(p < L|17, 17). 
L 


Note that Beta PDF is symmetric around 0.5 when a* = b*, and so is here where 
Be(17, 17). Then, 


1—a=Be(@ < U|I7, 17) — (1 — Be(@ < U|17, 17)) = 2Be(0 < U|17,17)— 1 


so Be(@ < U|17, 17) = 1 — a/2 and L = 1 — U. For a = 0.95, the 95% posterior 
credible interval for 8 is [L, U] = [0.36, 0.64]. 


Remark 1.1 Predictive credible sets for a future sequence of observations y1:m, are 
defined and constructed as parametric ones by replacing 0 with yi:m and 7 (x1:n|0) 
with p(y1:m|X1:n) in Definitions 1.1 and 1.2, and their consequences in this section. 
It is left as an Exercise. 


1.5 Hypothesis Test 


Often there is interest in reducing the overall parametric space © (aka the set of 
possible values of that the uncertain parameter 0 can take) to a smaller subset. For 
instance; whether the proportion of Brexiters is larger than 0.5 (p > 0.5) or not 
(p < 0.5). 

Such a decision can be formulated as a hypothesis test [1], namely the decision 
procedure of choosing between two non-overlapping hypotheses 


Ho: 8€@) vs H:8c0 (1.12) 


where {©o, ©1} partitions the space ©. Typically, hypotheses, {H;}, are categorized 
in three categories. Single hypothesis for 0 is called the hypothesis where ©; = {0;} 
contains a single element. Composite hypothesis for 0 is called the hypothesis where 
©; C © contains many elements. General alternative hypothesis for 6 is called the 
composite hypothesis where ©; = © — {6} when it is compared against a single 
hypothesis Ho : 0 = 6o. It is denoted as H; : 6 Æ 6o. 

Based on the partitioning implied by (1.12), the overall prior x can be expressed 


as (0) = mo X To(0) + 71 X 71 (0) where mg = So. z (d8), and z (0) = SOE 
an 
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Here, zo, and xı describe the prior probabilities on Ho and Hj, respectively, while 
ty(@) and xı (6) describe how the prior mass is spread out over the hypotheses Ho 
and H4, respectively. 
We could see the hypothesis testing (1.12) as parametric point inference about the 
indicator function 
0,0 E0 


l6M=), o" (1.13) 
kd 1 


To estimate (1.13), a reasonable loss function (0,8) would be the cy — cy loss 


function 
0 ,if6 € Oo, 6=0 


E 0 ,if6¢O,5=1 
"Jen , if € Qo, 6=0 
cy , if6€@,5=1 


(1.14) 


where cy > 0 and cy > 0 are specified by the researcher. Here, cy > 0 (and cy > 0) 
denote the loss if we decide to accept Hp (and H; ) while the correct answer would be 
to choose H; (Ho). According to (1.5), under (1.14), the Bayes estimator of (1.13) is 


Stave 0 , ifm € Oo|Xi:n) > oe (1.15) 
n 1 , otherwise i 


where z (0 € @o|X1-n) = to, z (d |x1:n). In other words, hypothesis H; is accepted 


jf Z@eOolrin) — co 
1(OEO| Xin) cI 
Hypothesis tests in Bayesian statistics can also be addressed with the aid of Bayes 


factors. Bayes factor Boj (x1:n) is the ratio of the posterior probabilities of Hp and H, 
over the ratio of the prior probabilities of Hp and H; 


(0 € OolXin)/ T0 € Oo) 


B a= - 1.16 
nn) = 5 @ € Oilen)/r0 € O1) van 
ean ; Ho : single vs H; : single 


91) 
Joo f (X1:n|0)700(d9) 
~ To, f 1:n|9)701 (dO) 


; Ho : composite vs Hı : composite , (1.17) 


RCM ICo ; Ho : single vs H; : composite 
o,f On 


Under the cy — cy loss function, (1.15) implies that one would accept Ho if Boi (x1:n) > 
a and accept H; if otherwise. Alternatively, Jeffreys [6] developed a scale rule 
(Table 1.1) to judge the strength of evidence in favor of Hg or against Hg brought by 
the data. Although Jeffreys’ rule avoids the need to specify cz and c;,, it is a heuris- 
tic rule-of-thumb guide, not based on decision theory concepts, and hence many 
researchers argue against its use. 
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Table 1.1 Jeffreys’ scale rule [6] 


Bot log 9 (Bor) Strength of evidence 

(1, +00) (0, +00) Ho is supported 

(10-!/2, 1) (—1/2, 0) Evidence against Ho: not 
worth more than a bare 

(10-!, 1071/2) (—1, -1/2) Evidence against Ho: 
substantial 

(1073/2, 107!) (—3/2, -1) Evidence against Ho: strong 

(10-2, 1073/2) (—2, —3/2) Evidence against Ho: very 
strong 

(0, 10-7) (—oo, —2) Evidence against Ho: decisive 


Example: Bernoulli model (Cont.) 


We are interested in testing the hypotheses Hp : 0 = 0.5 and H; : 6 Æ 0.5, given 
that mo = 1/2, and using the cr — cy loss function with cj = cy. Here, Oo = 
{0.5} and ©; = [0, 0.5) U (0.5, 1]. The overall prior is 7(@) = zola (0) + (1 — 
xo)Be(0 |a, b). The Bayes factor is 


Ii- Bri!) = 05° (1 — 69)" 
Jon Tir Br@il@)Be(Ola, b)dd  B(n¥ + a, n — nx + b)/B(a, b) ` 


Bou (i:n) = 


Given a = b = 2, n = 30, and XO? x; = 15, it is Bo (xin) = 18.47 > cn/cr = 1. 
Hence, we accept H1. 


1.5.1 Model Selection 


Often the researcher is uncertain which statistical model (sampling distribution) can 
better represent the real data generating process. There is a set M = {m,, m2, ...} of 
candidate statistical models mg = {fk (Pk); Ge E k}, where fi (-|~x) denotes the 
sampling distribution, and g denotes the unknown parameters for k = 1, 2,... Let 
Tk = T(m) denote the marginal model prior and 2;,(¢;,) = 7 (pk|m) denote the 
prior of the unknown parameters gx of given model mx. 

Selection of the ‘best’ model from a set of available candidate models can be 
addressed via hypothesis testing. For simplicity, we consider there are only two 
models mp and m; with unknown parameters ¥) E€ ®p and 7 € ®;. Then, model 
selection is performed as a hypothesis test 


Ho: (m, g) € Oo vs Hı: (m, g) € ©; (1.18) 
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where ©; = {my} x By, © = Uz Ox. The overall joint prior is specified as x (m, p) = 
To X Tolpo) + mı X 11 (G1) on (mM, g) € © where © = UO, where mg(pk) = 


m(m,9) 1m, (m) _ . 
PEC on Øg € Pz, and m = je z (mg, døg). Now the model selection prob- 


lem has been translated into a hypothesis test. 


Example: Negative binomial vs. Poisson model [2] 


We are interested in testing the hypotheses 
Ho : x;|¢ ~ Nb(ġ, 1), @>0, vs. Hy: x|A~ Pn), A>O 


by using the cy — cy loss function with cy = cy. Consider two observations x; = x2 = 
2 are available. Consider overall prior x (0) with density 7(@) = mpBe(@|ag, bo) + 
m\Ga(Ala,, bı) with To = Tı = 0.5. 

This is a composite vs. composite hypothesis test. It is 


I (do + bo) f n+ao—1 nxt+bo—1 
X1-n|Po)0(dgo) = ————— "(1-6 o do 
A fK 1; | 0) o( 0) T( DT (bo) A ( ) 


_ To + bo) T(n + ao) (nx + bo) 
P'(ao) PF (bo) T(n + nx + ao + bo) 


b” (oe) 7 
Xino) (dg) = 1 n f NEFA! exp(—(n + b1)à)dà 
7 f X1nlgi)m1 (d1) Tana Eby a |, p(—¢ A) 
T'(nx + ay) 1 


T (aı)(n + by )”* +a T xi! 


— Ibo) T(n+a)r(nž+bo) T(a)(n+b) t n i ; 
and hence Bo (X1:n) = DUES Tenab — rasta) [i x:!. Itis Bor @i:n) 


= 0.29 > 1, and hence I accept H, and the Poisson model. 
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Chapter 2 A) 
Sampling from Complex Probability PAE 
Distributions: A Monte Carlo Primer for 
Engineers 


Louis J. M. Aslett 


Abstract Models which are constructed to represent the uncertainty arising in engi- 
neered systems can often be quite complex to ensure they provide a reasonably 
faithful reflection of the real-world system. As a result, even computation of simple 
expectations, event probabilities, variances, or integration over utilities for a decision 
problem can be analytically intractable. Indeed, such models are often sufficiently 
high dimensional that even traditional numerical methods perform poorly. However, 
access to random samples drawn from the probability model under study typically 
simplifies such problems substantially. The methodologies to generate and use such 
samples fall under the stable of techniques usually referred to as ‘Monte Carlo meth- 
ods’. This chapter provides a motivation, simple primer introduction to the basics, and 
sign-posts to further reading and literature on Monte Carlo methods, in a manner that 
should be accessible to those with an engineering mathematics background. There 
is deliberately informal mathematical presentation which avoids measure-theoretic 
formalism. The accompanying lecture can be viewed at https://www.louisaslett.com/ 
Courses/UTOPIAE/. 


2.1 Motivation 


There is a natural tension when constructing a probabilistic model with the aim of 
encapsulating the uncertainty in an engineered system: on the one hand, there is a 
desire to capture every nuance of the system to fully reflect all knowledge about 
its behaviour; on the other, there is a drive towards parsimony for reasons of inter- 
pretability, robustness, and computability. Interpretability and robustness are impor- 
tant goals and should indeed guide a reduction in model complexity, but reducing 
model complexity purely to enable computability would seem a hinderance, espe- 
cially if that parsimony impedes answering the research questions at hand since, put 
simply, ‘reality can be complicated’ [7]. As such, the methodology of this chapter 
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should not be employed simply to enable an inappropriately complex model, but 
rather serves to facilitate the use of models which are complex enough when judged 
by purely subject matter and statistical concerns. 

Monte Carlo methods have played a crucial role in a vast array of applications 
of statistical methodology, from the prediction of future marine species discoveries 
[29] through to reconstruction of the ancient climate on Earth [16]; from criminal 
justice offending risk [17] to inferring networks of corporate governance through 
the financial crash [12]; and from estimating bounds on engineering system survival 
functions [11] to the assessment of offshore oil production availability [30]. The 
utility of Monte Carlo in these applications varies substantially, from estimation of 
confidence intervals and event probabilities, through optimisation methods to full 
evaluation of Bayesian posterior distributions for parameter inference. 

With this breadth of application in mind, we may assume hereinafter that we 
have a probabilistic model for some engineered system of interest which—after 
considering all subject matter and statistical concerns—is too complex to be able 
to compute relevant quantities of interest (be they event probabilities, confidence 
intervals, posterior distributions, etc.). As a concrete example, if one were to construct 
a Bayesian model of reliability using ideas introduced in Chap. 1, then our model 
would comprise some prior distribution over the vector of model parameters, 7 (0), 
together with a generative model for the failure time depending on those parameters, 
x (t | 0). After collecting some lifetime data t = {t1, ..., tn }, the most simple research 
question of interest may be the posterior expected value of the parameters: 


nlo = f @x@inao=— | 6 xe] ] xn |940 (2.1) 

2 2 i=1 

where Q is the space of all possible parameter values and c is a normalising constant. 
Indeed, it is traditional in Monte Carlo literature to focus attention on the compu- 

tation of expectations with respect to some probability density under consideration, 

which need not necessarily be a Bayesian posterior. That is, given a general proba- 

bility model xz (x), x € Q, and a functional f : Q — R, interest is typically in: 


Sal f(X)] = | Frada (2.2) 


and this is the perspective that will be adopted in this chapter. 

We complete our motivation of Monte Carlo in this Section by highlighting the 
generality of expectations of the form (2.2), followed by a short discussion of standard 
numerical integration techniques. In Sect. 2.2, the Monte Carlo estimator and its error 
analysis are recapped and contrasted with numerical integration. The core methods of 
Monte Carlo simulation are introduced in Sect. 2.3, with pointers to more advanced 
material in Sect.2.4. Note that we will in places abuse formal notation where we 
believe it aids intuitive understanding since the goal of this chapter is to be a basic 
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primer, not a rigorous treatment.! A first course in probability and statistics are 
assumed background. 

The accompanying lecture from the UTOPIAE training school can be viewed at 
https://www. louisaslett.com/Courses/UTOPIAE/. 


2.1.1 Generality of Expectations 


The formulation in (2.2) may appear rather restrictive to the uninitiated reader. How- 
ever, considering only expectations of this form does not result in any loss of gener- 
ality. For example, (re-)defining: 


1 n 
n(x) := zr) | [xt |x) 


i=l 


fœ) :=x 


means that (2.2) simply becomes the posterior expectation in (2.1). However, one 
should note that arbitrary statements of probability are also computable as expecta- 
tions. That is, 


P(X < a) = f ere / Ico (xr (x) dx = Ex Mo, (X)] 
Q 


—oo 
where for a general set E C Q, 


1 ifxeE 


lr) = V0 fxg E 


That is, to evaluate the probability of an arbitrary event, P(X € E), simply set 
f(X) := Ig(X) when evaluating (2.2). 


2.1.2 Why Consider Monte Carlo? 


In some special cases, the integral (2.2) may have an analytical solution and in such 
situations one should not resort to Monte Carlo or other methods. When there is 
no known analytical form for the integral, a reader with a general mathematical 


' For example, we will write ‘P(X = x)’ even where X is continuous to emphasise the link to the 
density function and will use 7 (x) to reference both a target distribution or prior where the meaning 
is clear from context. For the more advanced reader there are already many excellent more rigorous 
treatments in the literature, some of which we reference towards the end. 
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background may be tempted to reach for a numerical integration method, such as a 
simple mid-point Riemann integral or a more sophisticated quadrature approach. 

Consider the mid-point Riemann integral in the simple 1-dimensional setting. 
Letting g(x) := f (x)z (x), then the expectation would be approximated using n 
evaluations by: 


a 


rig aia ol 
Xx;i=a —.). 
4 n J 2 


The absolute error in using (2.3) is bounded [24, Theorem 7.1]: 


4 b-ac 
f g(x) dx = —— } 80), (2.3) 
j=l 


where 


b n 3 
b-—a (b — a) i 
aa D NS ame a wl: 


Clearly, pial : mMaXa<z<p |g” (z)| is fixed by the problem at hand and cannot be altered 
by the engineer, so we achieve the accuracy we require by controlling n~*— that is, by 
using a finer grid to compute the integral. As such, we say the error in the mid-point 
Riemann integral in 1 dimension is O (n~*)—that is, if double the computational 
effort is expended by computing on a grid of twice as many points (2n), then the 
worst case error is reduced by a factor of 4. This fast reduction in error and an explicit 
bound on it are very attractive properties. 

However, as the dimension of x increases, the Riemann integral’s effectiveness 
diminishes substantially. In general, the error of mid-point Riemann integration in 
d-dimensions is O (n= g ). For example, even in a modest 10-dimensional problem, 
when the computational effort is doubled the worst case error is only reduced by a 
factor of ~ 1.15. Put another way, to halve the worst case error in a 10-dimensional 
problem requires exp (£ log 2) = 32 times the computational effort. This problem 
has been coined the ‘curse of dimensionality’. 

Of course, the Riemann integral is not the best numerical integration method, 
but even Simpson’s rule only improves this to O (n= a), In general Bakhvalov’s 
Theorem bounds all possible quadrature methods by O(n~"/“), where r is the number 
of continuous derivatives of g(-) which exist and are exploited by the quadrature rule 
[24]. 

The striking result which motivates the study of Monte Carlo methods is that for a 
d-dimensional problem, the (mean-square)? error is O (n *). The most important 
point to note is the absence of d in the order of the error: increasing the computa- 
tional effort by some fixed amount has the same relative effect on the worst case 
error regardless of dimension. Of course, the devil in the detail is that the constant 


? Note that randomised simulation methods such as Monte Carlo typically report mean-square error 
rather than absolute error bounds. 
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Method 
— Monte Carlo d-dim 


— Riemann 2-dim 


Order of error 


— Riemann 6-dim 


1 1 1 
100 1000 10000 
n 


Fig. 2.1 The order of error reduction—that is, only the leading O (n fd ) term—is plotted against 
different computational effort n. Note that all these curves would be multiplied by a different (fixed) 
problem dependent constant 


factor which we are ignoring in that statement almost certainly has some dimension 
dependence, but this is true for quadrature methods too. Figure 2.1 illustrates the 
differences. 

Consequently, Monte Carlo methods are well suited to address the problem of 
analysing complex probabilistic models of engineered systems, since this is precisely 
a setting where the parameter dimension is likely to be large. 


2.2 Monte Carlo Estimators 


The standard Monte Carlo estimator of the integral (2.2) is 
ie n 
nê f f(x)m(x)dx = -Y f(aj) 2 A, (2.4) 
Q ea 


where x; ~ 2(-). In other words, the problem of integration is transformed instead 
into the problem of drawing random samples x; distributed according to the proba- 
bility density 7 (-). Importantly, this estimator is unbiased, that is, E[Z] = u. 

If the samples x; are independently and identically distributed (iid) according to 


z (-), then the root mean-square error of the estimator (i is 


2 


1 n 
RMSE := |E, [ feommax - ro =< 
j=l 


= 
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where o° = Var, ( f (X)). Again, part of this error is (mostly) inherent to the prob- 
lem*—o in this case—so that we achieve desired accuracy by controlling n~!/. 
There are at least three very attractive conclusions we can draw from this form: 


1. as mentioned already, the relative error reduction achieved by additional com- 
putational effort is independent of dimension; 

2. there is no explicit dependence on how smooth the functional, f (-), or probability 
density, 2 (-), are (though these may influence o); 

3. in contrast to quadrature methods, an estimate of the error can be computed 
from the work already done to compute the integral, by computing the empirical 
standard deviation of the functional of the samples drawn from z (-). 


Although an absolute error is not available for a randomised method like this, a 
simple application of Chebyshev’s inequality does provide a probabilistic bound on 
the absolute error exceeding a desired tolerance: 


> A 1,)2 2 
PU — pl >) < Exh —~ W)"] _ 9 l 
Dog e2 ng? 


Indeed, it is also possible to invoke the iid Central Limit Theorem so that asymp- 
totically, 


where ®(z) denotes the standard Normal cumulative distribution function (CDF). 
This enables the formation of confidence intervals for u based on large n samples. 

The discussion to date has tacitly assumed that simulating from arbitrary proba- 
bility distributions 7 (-) is possible and relatively efficient. In fact, most Monte Carlo 
research is devoted to this effort since, as touched on above, there is rich and well- 
established theory when such samples are available. Therefore, for the remainder of 
this chapter, our attention turns away from discussion of the integrals which are of 
primary interest and focuses on the problem of simulating from arbitrary probability 
distributions z(-). Once these samples are available, the results above can be used to 
analyse the resulting estimators. 


2.3 Simple Monte Carlo Sampling Methods 


In this section we introduce some simple Monte Carlo methods which enable sam- 
pling from a wide array of probability distributions. Note that understanding these 
simple methods is crucial as they are extensively used as building blocks of more 
sophisticated sampling methodology. 


3 There are advanced Monte Carlo methods which can reduce this variance, but this is beyond the 
scope of this chapter. See for example [24, Chap. 8]. 
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Almost all Monte Carlo procedures start from the assumption that we have avail- 
able an unlimited stream of iid uniformly distributed values, typically on the interval 
[0, 1] c R. How to generate such an iid stream is beyond the scope of this intro- 
ductory chapter, but the interested reader may consult [13, Chaps. 1-3] and [21]. 
Arguably the current gold standard algorithm remains that in [22]. Typically, the 
average user of Monte Carlo need not worry about such issues and may rely on the 
high quality generators built into software such as R [26]. 

Thus the objective hereinafter is to study how to convert a stream u; ~ Unif(0, 1) 
into a stream x; ~ x (-), where x; is generated by some algorithm depending on the 
stream of u;. In more advanced methods (see MCMC), x; may also depend on x; 
or even Xi, ...,Xj-1. 


2.3.1 Inverse Sampling 


Arguably the simplest example of generating non-uniform random variates is inverse 
sampling, which typically applies only to 1-dimensional probability distributions 
(though higher dimensional extensions have been studied). Let F(x) := P(X < x) be 
the cumulative distribution function (CDF) for the target probability density function 
m(-). Then, inverse sampling requires the inverse of the cdf, F~'(-), which is then 
applied to a uniform random draw. Precisely, see Algorithm 2.1. 


Algorithm 2.1 Inverse sampling algorithm 


1: procedure INVERSE SAMPLING(F~!(-)) > Generate random sample from distribution with 
inverse CDF F~!(.) 

2 u ~ Unif (0, 1) 

3 x < F`! (u) 

4: return x 

5: end procedure 


To prove that the sample returned by Algorithm 2.1 is distributed according to x (-) 
is straight-forward. We do so by computing the CDF, P(X < x), of the X generated 
by this algorithm and show that this agrees with the CDF of z(-). The first step 
substitutes X = F~'(U), where U ~ Unif(0, 1), as per the algorithm: 


P(X <x) =P(F'(WU) <x) 
= P( F(F-!(U)) < F(x)) applying F (-) to both sides 
=P(U < F(x)) Uniform CDF P(U < u) =u 
= F(x). 


Note that applying F (-) to both sides in the second line is valid, since the cumulative 
distribution function is a non-decreasing function by definition. 
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Fig. 2.2 Inverse sampling for both a continuous distribution function (left) and one containing 
jump discontinuities and regions of zero probability (right). Uniform random draws u are sampled 
and inverted through the distribution function in the obvious way (left), or by taking the infimum 
over values of x such that F (x) > u (right). In the right illustration, the hypothetical (“Hypothetical’ 
since strictly speaking this is an event of probability zero) uı coincides with the value at which 
F(x) is constant and u2 lies within the jump discontinuity 


One subtlety to be aware of is that for discrete distributions or continuous distri- 
butions with jump discontinuities or areas of no support, we must define: 


F~! (u) = inf{x : F(x) > u}, Vu € [0, 1]. 


It may be tempting when F7! (-) is not available to use a numerical solver to solve 
F(x) = u in place of line 3 in Algorithm 2.1. However, caution is required since 
this can result in bias [10, p. 31]. The procedure of inverse sampling is illustrated in 
Fig. 2.2. 

Notice that this is univariate, yet earlier we saw that numerical integration will give 
better error bounds than Monte Carlo for low dimensional problems—as such one 
may choose not to use inverse sampling to actually evaluate univariate expectations. 
However, we often need a set of random draws from non-uniform univariate distri- 
butions which feed into a broader Monte Carlo algorithm, which is itself sampling in 
higher dimensions: in such situations inverse sampling is very useful. Indeed, if you 
use the rnorm function in R [26], it has used inverse sampling to generate random 
draws from the Normal distribution since 2003 (see /src/nmath/snorm.c lines 
265-270), prior to that using [18] since at least v0.62 in 1998. 

A final comment: inverse sampling is a special case of general transformation 
sampling. If one can generate samples from one distribution, there may be an appro- 
priate transformation to turn these into samples from another distribution that may 
be more tractable or faster than inverse sampling. For further details, see for example 
[24, Chap. 4.6]. 
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2.3.1.1 Example 


In order to use inversion sampling for a Weibull distribution with shape k and scale 
o, X ~ Weibull(k, o), we note that 


k = ee 
x(x) = — (=) at, x € [0,œ),o >0,k>0 
o 


o 
F(x) = t-ap- (=) }. 


To find F~'(u), set 1 — exp [- ( y = u and solve for x: 


=> x = F`! (u) = ø (—log(1 —u))!/* ~ x). (2.5) 


In order to generate samples from the Weibull we, therefore, take values, u, from a 
Uniform random number stream and transform them using (2.5). 


2.3.2 Rejection Sampling 


Our first higher dimensional method is an elegant algorithm, which actually crops up 
in more advanced guises at the cutting edge of modern Monte Carlo methods (e.g. 
[9, 25]). Here, the goal is to find another distribution, say 7 (-), which is easier to 
sample from (perhaps even using inverse sampling) and where we can construct a 
bound on the density function: 


m(x) <cmt(x) Vx EQ, (2.6) 


where c < oo and where z and 7 need not be normalised probability densities. We 
call 7(-) the ‘proposal’ density, since samples will be drawn from this and then 
exactly the correct proportion of them retained in order to end up with a stream of 
samples from v7 (-). The full procedure is detailed in Algorithm 2.2. 

We will proceed based on the assumption that 7 (-) and 7 (-) are normalised den- 
sities. However, note that the algorithm is also valid for un-normalised densities, so 
long as there still exists a c satisfying (2.6) for the un-normalised densities. 

The efficiency of the algorithm hinges entirely on the value of c, so that it should 
be chosen as small as possible. This is because, letting A be the random variable for 
acceptance of a proposed sample X, the acceptance probability is (abusing notation 
to aid intuition): 
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Algorithm 2.2 Rejection sampling algorithm 


1: procedure REJECTION SAMPLING(z(-), 7(-), c) > Generate random sample from distri- 
bution with unnormalised density 7 (-) 

2: a <- FALSE 

3: while a = FALSE do œ Repeat until acceptance 

4: u ~ Unif(0, 1) 

5: xaT) > Propose a possible sample 

6: if u < zo then > Accept or reject proposal? 

T a < TRUE 

8: end if 

9: end while 


10: return x 
11: end procedure 


P(A= = f P(A=1|X=x) P(X=x) dx 
Q ——— — i 


Prob line 6 of Alg 2 gives TRUE. Proposal density f. 


=| P(u< =") (x) dx 


a m 
Uniform CDF, P(U <u)=F (u)=u. 


z aE ET 
Q 


cit (x) 


== f rax 
C JQ 


1 
=e (2.7) 


C 


where Q2 is the support of 7 (-), T(x) > 0, Y x € Q. The final line follows because 
the integral of a density over the whole space is 1. 

Hence, the number of iterations of the loop on lines 3—9 in Algorithm 2.2 which 
must be performed to return a single sample from x (-) is Geometrically distributed 
with parameter L, Therefore, the expected number of random number generations 
and function evaluations which must be performed is 2c per sample from x (-). 

To see that Algorithm 2.2 does indeed give a sample from x (-), we note that the 
samples returned are only those which are accepted, so we condition on this event: 


PX e gja = p= eee VEEB 


m2) (x) dx 


E cã(x) 
1 


E 


= f (x) dx. 
E 
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Fig. 2.3 Geometric interpretation of rejection sampling. First y is sampled from 7 and then a 
uniform is sampled along the vertical dashed line at that location (i.e. between 0 and cz (y)). If the 
uniform sample falls below z then we accept and otherwise we reject. It is therefore clear that the 
closer cz ‘hugs’ x the more efficient the rejection sampler 


The last line is the probability of event E under the distribution with density 7 (-), as 
required. 

There is a nice geometric interpretation of the rejection sampling algorithm which 
aids greatly with intuition. Notice that the condition in line 6 can be rewritten 
uci (x) < a(x). This means ucz (x) is a uniform random number in the interval 
[0, cz (x)], so that we can view rejection sampling as first drawing a value from 
T(x), then moving it up to a uniformly distributed height under the curve cz (x). 
The consequence of this is that we are effectively sampling uniformly points under 
the curve cz (x) and accepting those that fall under the curve x(x), as depicted in 
Fig. 2.3. 

Care is required with rejection sampling in high dimensions because it is quite 
easy for the acceptance probability to become so small as to make the technique 
impractical. We will see that, as well as how to implement rejection sampling, in the 
following example. 


2.3.2.1 Example 


Consider the problem of sampling from a zero mean d-dimensional multivariate 
Normal distribution, having density: 


1 
(x) = (27) ~4/? det( £) exp (-"="x} : xe R’, 


where & is ad x d symmetric positive semi-definite covariance matrix. It is com- 
paratively easy to sample univariate Normal random variables (e.g. using inverse 
sampling in R [26] as mentioned earlier, or via a transformation type approach like 
[3]). Thus we could consider using a multivariate Normal with diagonal covariance, 
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oI, as a proposal, because this simply requires sampling d univariate Normal ran- 
dom variables. 
This would mean we need to determine c < œo such that 


1 1 
cdet(o? N7! exp (-3x"o 7x) > det(x)~!/? exp (-3x"2"'x} ¥xeR¢ 
(2.8) 


E is symmetric, so it has eigendecomposition E = QAQ™ => E~! = QA'O?, 
where Q is an orthogonal matrix and A is a diagonal matrix with entries consisting 
of the eigenvalues 4,,..., Ag. The orthogonal transformation y = QTx also spans 
R4, so that (2.8) <> 


-1/2 

1 1 

co“ exp (-3y'o71y) > (i n) exp (-3y"4"'y) , VyeR? 
d d 

4> 2logc > Yio? — Ary? +2dlogo — Slog Ai. 


i=l i=l 


Ifo? < ar! for any i, then the right-hand side cannot be bounded above (since the 
inequality must hold Y y; € R), so we must have max; à; < o° and then clearly c 
is minimised for 0? = max; A;. Since every term in the first sum of the right-hand 
side is necessarily negative, the right-hand side is maximal for y; = 0 V i, so that the 


optimal c is 
dj2 [a =1/2 
c= (max is) (i n) ; 


i=1 
when o? = max; Aj. 

In summary, there is a constraint on our proposal z(-) when it is an uncorrelated 
multivariate Normal density, or else it cannot bound v (-). Moreover, we can explicitly 
compute the optimal proposal variance, ø?, to give us the highest possible acceptance 
rate. 

To make this example concrete, consider rejection sampling in this setting where 


1 0.9---0.9 

0.9 1 ---0.9 
L= 

0.90.9... 1. 


Note that & can be written as 0.17 + B, where B is a matrix with 0.9 in every 
element. The rank of B is 1, so it has a single non-zero eigenvalue which must 
therefore equal tr(B) = 0.9d, and the eigenvalues of 0.17 are all 0.1. Further- 
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Table 2.1 Optimal proposal variance and acceptance probability for rejection sampling a correlated 
multivariate Normal distribution using an uncorrelated multivariate Normal proposal 


d o? = max; Aj Acceptance probability 1 
i 1 1 

2 1.9 0.229 

3 2.8 0.036 

4 3.7 0.004 

5 4.6 4.45 x 1075 

10 9.1 1.53 x 107° 


more, 0.17 and B commute, therefore the eigenvalues of & are the sum of these 
eigenvalues: that is, A; = 0.9d + 0.1 and à; = 0.1 Vi £1. As the dimension of 
z(-) increases, the spectral gap increases linearly and thus c grows very fast: 
c = (10A,)¢")? = (9d + 1)¢-/? Indeed, this is faster than exponential and faster 
than factorial growth! Consequently, for growing dimension, the acceptance proba- 
bility falls super-exponentially fast—not a desirable property. See Table 2.1 for some 
example values. 

A whimsical observation to emphasise the problem: a modern laptop can pro- 
duce roughly 15 million univariate Normal samples per second and the universe 
is estimated to be 4.32 x 10!” seconds old. Ignoring the time to evaluate the uni- 
form draw u or acceptance/rejection, this means the expected number of samples 
that would be generated by Algorithm 2.2 for this multivariate Normal problem in 
d-dimensions—if run for as long as the universe has existed—would be 


1.5 x 107 x 4.32 x 10! 
d(9d + NDP 


Consequently, even knowing the exactly optimal choice for ø? in our proposal, this 
would only be expected to render 5 samples for a 21-dimensional multivariate Normal 
with the innocuous looking & given above—rejection sampling in high dimensions 
can be problematic! 


2.3.3 Importance Sampling 


The final core standard Monte Carlo method we cover in this primer also starts from 
the perspective of having a proposal density 7 (-), though we no longer require it to be 
able to bound z(-). Importance sampling then dispenses with the notion of directly 
generating iid samples from v (-) and focuses on their use: in computing expectations 
using those samples in (2.4). Consequently, importance sampling weights the samples 
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from 7 (-) in precisely the proportion that ensures these weighted samples produce 


expectations which are concordant with expectations under z(-) when used in (2.4). 
This is laid out precisely in Algorithm 2.3. 


Algorithm 2.3 Importance sampling algorithm 


1: procedure IMPORTANCE SAMPLING(z(-), 7(-), c) > Generate random sample from dis- 
tribution with un-normalised density 
; C) 
x~H(-) > Propose sample 


2 

3 

4: return (x, w). 
5: end procedure 


To see that this weighting has the desired effect, consider the expectation which 
is our objective. We first consider the situation where both x and 7 are normalised: 


AFON = f FOr dx 


T(x) l S 2 
= f f (x)= = Tai dx multiply and divide by 7 (x) 
2 


«(00s 


_ peasy 
= T | 


That is, we use samples directly from z(-), and instead adjust (2.4) to target the 
expectation of the same functional under v (-). 


uf farada x oie Foyt Eros 2A, 29) 


= er 


where now x; ~ 7(-). 

Some care is required, because although this estimator remains unbiased, the 
variance is no longer going to be the same as the usual Monte Carlo variance where 
x; ~ 7(-). Indeed, now 


2 (f (x) a(x) — pit (x))” 


g2 
Var(ù) = "z where o = z 
Q i (x) 


dx, 


which can be empirically estimated from the importance samples using, 
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A2 
(faw - ây. (2.10) 


As such, ož (or its empirical estimate ôZ) provide a guide to when we have a ‘good’ 
importance sampling algorithm, since with ^ (-) fixed the only option to improve the 
estimate is to increase the sample size n. 

Indeed, it can be shown [15] that the theoretically optimal proposal distribution 
which minimises the estimator variance is 


lf) |r) 
Jo lf Gla) dx 


T (x)opt = 


In particular, note that this implies that importance sampling can achieve super- 
efficiency whereby it results in lower variance even than sampling directly from 
m(-) when f(x) Æ x. Specifically, if f(x) > 0 Yx then this proposal results in a 
zero-variance estimator! Of course, in practice we cannot usually sample from and 
evaluate this optimal proposal, since it is at least as difficult as the original problem 
we were attempting to solve. However, even though these optimal proposals are often 
unusable, they provide guidance towards the form of a good proposal for any given 
importance sampling problem. 


2.3.3.1 Self-normalising Weights 


The option to use rejection sampling with un-normalised densities is very helpful (e.g. 
in Bayesian settings where the normalising constant is often unknown). We can retain 
this advantage with importance sampling by using self-normalising weights. The 
algorithm to generate the weights remains as in Algorithm 2.3, but the computation 
of the estimator in (2.9) changes. The self-normalised version, rather than dividing 
by n, uses the sum of the weights, 


a A ae f(xj)wj 
eer wie” 


thereby ensuring cancellation of the unknown normalising constant from the target 
and/or proposal distributions in the weights. 

However, it is important to note that this estimator is no longer unbiased, though 
asymptotically it is. Additionally, the variance of this estimator is more complicated 
having only approximate form. An approximate estimate can be computed using, 
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6? 


n 
^x Ax x ARN 
Var (ju) ~ a where 67? = Yw? (f(x;) ê*) 
j=1 
Wj 
iat Wi 


Finally, the theoretically optimal (but usually unusable) proposal in the self- 
normalised weight case is 


and wi = 


T(X)opt X | f(x) — alr (x). 


In both regular and self-normalised weight settings, one can then compute appro- 
priate confidence intervals in the usual manner. 


2.3.3.2 Diagnostics 


Additional care is required in the application of importance sampling when compared 
to using iid samples from the distribution of interest. In particular, because importance 
sampling uses a weighted collection of samples, itis not uncommon to be in a situation 
where a small number of samples with large weight dominate the estimate, so that 
simply having many importance samples does not equate to good estimation overall. 

A common diagnostic for potential weight imbalance is derived by equating the 
variance of a weighted importance sampling approach to the standard iid Monte Carlo 
variance for an average computed using a fixed but unknown sample size ne. Upon 
simple algebraic rearrangement one may then solve for ne, the so-called effective 
sample size. This informally corresponds to the size of iid Monte Carlo sample one 
would expect to need to attain the same variance achieved via this importance sample, 
so that a low value indicates poor weight behaviour (since that corresponds to few 
iid samples). 


Var (=a) = i 3 na? 
Viet Wi 


where 
n n 
- 1 = I 
w= | - ) wj and w? = — y wa. 
j 
n 4 n* 
j=l j=l 


The reason to use such a diagnostic and not simply rely on the empirical variance 
estimates above is that they are themselves based on the sampling procedure and 
therefore may be poor estimates too. 

Finally, it is critical to note that although small ne does diagnose a problem with 
importance sampling, it is not necessarily true that large ne means everything is ok: 
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itis, for example, entirely possible that the sampler has missed whole regions of high 
probability. 


2.3.3.3 Example 


Consider the toy problem of computing P(X > 3.09) when X is a univariate standard 
Normal random variable. The R-language [26] computes the distribution function 
of the standard Normal to at least 18 significant digits using a rational Chebyshev 
approximation [6] (see /src/nmath/pnorm.c) and we know the true answer 
to be 0.001001 (4 sf). However, if the reader suspends disbelief and imagines that 
we cannot accurately compute the distribution function, but can only compute the 
Normal density and draw random realisations from it, then evaluation of the above 
probability might be approximated using Monte Carlo methods instead (given the 
unbounded support and extreme tail location this may be preferred to numerical 
integration). 

Since we are assuming the ability to generate random realisations from the 
Normal distribution, a standard Monte Carlo approach would draw many samples 
x; ~ N(O, 1) and compute 


1 n 
P(X > 3.09) = E[I3.09,00)(X)] = = J 1.09,03) 
j=l 


However, this will require many samples to achieve an accurate estimate of this tail 
probability. 

In contrast, still only using simulations from a Normal distribution, we may elect 
to use importance sampling with a proposal N(m, 1) for some choice m. We know 
the fully normalised density of a Normal distribution and therefore will be using the 
estimator (2.9) with associated single sample variance which can be approximated 
using (2.10). Therefore, to select m, we perform a small grid search over possible 
proposals, computing ô? each time, to find a good choice. This results in Fig. 2.4, 
showing that a proposal N(3.25, 1) is a good choice. 

A further final run of n = 100,000 samples renders an estimate ĝ = 0.001002 (4 
sf). The same pseudo-random number stream using standard Monte Carlo renders an 
estimate 0.001140 (4 sf), which is a relative error 163 x larger than the importance 
sampling estimate. To demonstrate this is not a ‘fluke’ result, we continue to repeat 
both importance sampling and standard Monte Carlo estimation with runs of size 
n = 100,000 and plot the density of estimates of P(X > 3.09) in Fig. 2.5. 

Note that Fig.2.5 demonstrates how much more accurate importance sampling 
is for the same sample size n = 100,000 when computing this event probability 
compared to standard Monte Carlo. One may reasonably object that we have ignored 
the 25 pilot runs of n =100,000 importance samples used to select m = 3.25, so 
that the total computational effort expended on importance sampling was at least 
26x that of standard Monte Carlo. However, it is a simple calculation to determine 
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Fig.2.4 The estimate of 62 using (2.10) for 25 different values of m in the Normal proposal N(m, 1) 
used in an importance sampling estimator of P(X > 3.09), where X ~ N(O, 1). Each estimate of 
ae is based on n = 100, 000 samples. m varies on an equally spaced grid from 1.5 to 4.5. The 
minimum is at m = 3.25 
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Fig. 2.5 A total of 10, 000 runs of both importance sampling and standard Monte Carlo, each of size 
n = 100,000. Each run was used to compute the estimate of P(X > 3.09) where X ~ N(O, 1) and 
a kernel density plot of these estimates produced (importance sampling = dashed, standard Monte 
Carlo = solid). The vertical line is the ground truth computed using pnorm(3.09, lower.tail 
= FALSE). The same pseudo-random number stream was used for each method to ensure a fair 
comparison 


that based on the standard deviation of the samples used to generate Fig.2.5 and 
the ./n convergence of Monte Carlo, that it would require a standard Monte Carlo 
sample of size n = 295 x 100,000 to achieve the same accuracy profile as importance 
sampling. Therefore, even accounting for the pilot computational effort to select a 
proposal distribution, there is a substantial benefit to using importance sampling. 


2.4 Further Reading 


A textbook length introduction with a solid emphasis on implementation details in 
R can be found in [28]. The same authors have a more advanced textbook going into 
the theoretical aspects more deeply [27]. Both these books also introduce Markov 
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Chain Monte Carlo methods, which are often used in practice in high dimensional 
problems. A nice tutorial paper introduction to MCMC is [1] and [4] is an excellent 
collection of chapters on the topic. 

A classic Monte Carlo text is [10], which is now freely (and legally) available 
online and contains many results not easily found elsewhere. 

Although standard Monte Carlo and Markov Chain Monte Carlo arguably rep- 
resent the mainstay of most practical uses of Monte Carlo, there are an array of 
advanced methods which are particularly well suited to different settings. Some excel- 
lent review texts as jumping off points to explore some of these include [8] (Sequen- 
tial Monte Carlo), [19, 20] (Approximate Bayesian Computation), [14] (Multi-Level 
Monte Carlo), and [2] (MLMC in engineering reliability). 

Excellent software choices for practically performing inference in complex mod- 
els via sampling methods includes Stan [5] and Birch [23]. 


Dedication 


This chapter is dedicated to the memory of Brett Houlding (1982-2019). The tragic 
news of Brett’s passing was received while I was in the act of writing this chapter. 
He will be sorely missed. 
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Chapter 3 A) 
Introduction to the Theory get 
of Imprecise Probability 


Erik Quaeghebeur 


Abstract The theory of imprecise probability is a generalization of classical ‘pre- 
cise’ probability theory that allows modeling imprecision and indecision. This is a 
practical advantage in situations where a unique precise uncertainty model cannot 
be justified. This arises, for example, when there is a relatively small amount of 
data available to learn the uncertainty model or when the model’s structure cannot 
be defined uniquely. The tools the theory provides make it possible to draw conclu- 
sions and make decisions that correctly reflect the limited information or knowledge 
available for the uncertainty modeling task. This extra expressivity however often 
implies a higher computational burden. The goal of this chapter is to primarily give 
you the necessary knowledge to be able to read literature that makes use of the theory 
of imprecise probability. A secondary goal is to provide the insight needed to use 
imprecise probabilities in your own research. To achieve the goals, we present the 
essential concepts and techniques from the theory, as well as give a less in-depth 
overview of the various specific uncertainty models used. Throughout, examples are 
used to make things concrete. We build on the assumed basic knowledge of classical 
probability theory. 


3.1 Introduction 


The theory of imprecise probability is a generalization of classical ‘precise’ proba- 
bility theory that allows modeling imprecision and indecision. Why is such a theory 
necessary? Because in many practical applications a lack of information—e.g., about 
model parameters—and paucity of data—especially if we also consider conditional 
models—make it impossible to create a reliable model. 

For example, consider a Bayesian context where a so-called prior probability 
distribution must be chosen as part of the modeling effort. The lack of information 
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may make it difficult to determine the type of the prior distribution, let alone its 
parameters. Then, even if we assume some prior has been chosen—e.g., a normal 
one—in a somewhat arbitrary way, a paucity of data will make the parameters of the 
posterior—updated—distribution depend to a large degree on the prior’s somewhat 
arbitrary parameters. The consequence is that conclusions drawn from the posterior 
are unreliable and decisions based on it somewhat arbitrary. 

The theory of imprecise probability provides us with a set of tools for dealing with 
the problem described above. For the example above, instead of choosing a single 
prior distribution, a whole set of priors is used, one that is large enough to sufficiently 
reduce or even eliminate the arbitrariness of this modeling step. The consequence is 
that conclusions drawn from an imprecise probabilistic model are more reliable by 
being less committal—more vague, if you wish; some would say ‘more honest’ —and 
that decisions based on it allow for indecision. 

In this chapter, we will go over the basic concepts of the theory of imprecise prob- 
ability theory. Therefore, we will consider ‘small’ problems, with finite possibility 
spaces. However, the theory can be applied to infinite—countable and uncountable— 
possibility spaces as well. Also, only the basics of more advanced topics such as 
conditioning will be touched upon. But, and this is the chapter’s goal, after having 
understood the material we do treat, the imprecise probability literature should have 
become substantially more accessible. Good extensive general treatments are avail- 
able [2, 16, 20] and the proceedings of the ISIPTA conferences provide an extensive 
selection of papers developing imprecise probability theory or applying it [1, 3, 4, 
6-12]. 

Concretely, we start with a discussion of the fundamental concepts in Sect. 3.2. 
This is done in terms of the more basic notion of sets of acceptable gambles. Probabil- 
ities only appear thereafter, in Sect. 3.3, together with the related notion of prevision 
(expectation). The connection with sets of probabilities is made next, in Sect. 3.4. 
Then we touch upon conditioning, in Sect. 3.5, and before closing add some remarks 
about continuous possibility spaces, in Sect. 3.6. Throughout we will spend ample 
time on a running example to illustrate the theory that is introduced. 


3.2 Fundamental Concepts 


In this section, we introduce the fundamental concepts of the theory of imprecise 
probability [18] [20, §3.7]. First, in Sect. 3.2.1, we get started with some basic 
concepts. Then, in Sect. 3.2.2, we list and discuss the coherence criteria on which 
the whole theory is built. 
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3.2.1 Basic Concepts 


Consider an agent reasoning about an experiment with an uncertain outcome. This 
experiment is modeled using a possibility space—a set—X of outcomes x. Now 
consider the linear space L = X — R of real-valued functions over the outcomes. 
We view these functions as gambles because they give a value, seen as a payoff, for 
each outcome and because the outcome is uncertain and therefore the payoff is as 
well. A special class of gambles are the outcome indicators 1, or subset indicators 1g, 
which take the value one on that outcome or subset and zero elsewhere. 

The agent can then express her uncertainty by specifying a set of gambles, called 
an assessment A, that she considers acceptable. Starting from such an assessment, 
she can reason about other gambles and decide whether she should also accept them 
or not. If she were to do this for all gambles, then the natural extension & of her 
assessment would be the set of all acceptable gambles. To reason in a principled way, 
she needs some guiding criteria; these are the next section’s topic. 

Let us now introduce our running example: 


Wiske and Yoko Tsuno want to bet on Belgium vs. Japan 


Given a sports match between Belgium and Japan, there is uncertainty about which 

country’s team will win. So we consider the possibility space {BE, JP}. There are to 
agents—gamblers—: Wiske and Yoko Tsuno, two comic book heroines. Each has an 
assessment consisting of a single gamble that they find acceptable: 


e Wiske accepts losing 5 coins if Japan wins for the opportunity to win 1 coin if 
Belgium wins; so Aw = {1ps — 5- lip}. 

e Yoko Tsuno accepts losing 4 coins if belgium wins for the opportunity to win 
1 coin if Japan wins; so Ay = {—4- lge + Lp}. 


The heroines are also discussing joining forces and forming a betting pool. The 
pools they consider are 


e ‘Simple’, formed by combining their assessments; so 
Aspe = {lpr — 5- lp, —4- leg + l}. 


e ‘Empty’ in case of disagreement, without any acceptable gambles; so Agp = Ø. 


3.2.2 Coherence 


In the theory of imprecise probabilities, the classical rationality criteria used for 
reasoning about assessments are called coherence criteria. These are typically for- 
mulated as four rules that should apply to any gambles f and g. (There are different 
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variants in the literature, but the differences are not relevant in this introductory text.) 
We divide the criteria into two classes. 


Constructive State how to generate acceptable gambles from the assessment: 
Positive scaling If f is acceptable and A > 0, then A - f is acceptable. 
Addition If f and g are acceptable, then f + g is acceptable. 


Background State which gambles are always or never acceptable: 
Accepting gain If f is nonnegative for all outcomes, then f is acceptable. 
Avoiding sure loss If g is negative for all outcomes, then g is not acceptable. 


These criteria are quite broadly seen as reasonable, under the assumption that the 
payoffs are ‘not too large’. 

The last criterion, ‘Avoiding sure loss’, puts a constraint on what is considered 
coherent; if it is violated, we say that an assessment incurs sure loss. The first three 
rules can be used to create an explicit expression for the natural extension: 


6=[Dypexdy f KEAU(FEL: f=} and Wf EK: àp =0)}, 


where € denotes the finite subset relation. Then & is the smallest convex cone of 
gambles encompassing the assessment A and the nonnegative gambles—including 
the zero gamble. 

Let us apply the natural extension to our running example: 


The natural extensions of Wiske, Yoko Tsuno, and the betting pools 


For our finite possibility space, 


{fEel: f =O = {P exh: ly: Wx eX: u = 0)} 


So, with Aa, Ux > 0 for all outcomes x and agent identifiers A, we get the following 
expressions that characterize the natural extensions: 


Aw: (pp — 5- lip) + Upe’ lpg + Mp: Lie 

= (Aw + Upp): [pp + (—5 - AW + wp): lp, 
ày- (—4- lge + lp) + Upe: lge + Wr: Lie 

= (—4 : ày + Upp): lgg + Ay + mp) Le, 
Simple pool (Aw — 4: Ày + Mpg): lgg + (—5 AW + ày + mp) lp, 
Empty pool (gg + lge + Mp: lp. 


Wiske 


Yoko Tsuno 


To check whether the natural extension incurs sure loss, we must check whether the 
coefficients of 1; and 1, can become negative at the same time. Only the simple pool 
incurs sure loss; e.g., fill in Aw = Ay = 1 and Upp = Hyp = 0 to convince yourself. 
(Convince yourself as well that the others avoid sure loss indeed.) 


3 Introduction to the Theory of Imprecise Probability 41 


3.3 Previsions and Probabilities 


In this section, we move from modeling uncertainty using sets of acceptable gam- 
bles to the more familiar language of expectation—or, synonymously, prevision— 
and probability [17]. We first transition from acceptable gambles to previsions in 
Sect. 3.3.1 [18, §1.6.3] [17, §2.2] and in a second step, in Sect. 3.3.2, give the con- 
nection to probabilities [20, §2.6]. Next, in Sect. 3.3.3, we consider assessments in 
terms of previsions and what the other fundamental concepts of Sect. 3.2 then look 
like [17, §2.2.1, §2.2.4] [20, §2.4-5, §3.1]. Finally, in Sect. 3.3.4, we consider the 
important special case of assessments in terms of previsions defined on a linear space 
of gambles [17, §2.2.1] [20, §2.3.2-6]. 


3.3.1 Previsions as Prices for Gambles 


Before we start: ‘prevision’ is in much of the imprecise probability literature used 
as a synonym for ‘expectation’; we here follow that tradition. 

Now, how do we get an agent’s previsions for a gamble—equivalently: expecta- 
tion of a random variable—given that we know the agent’s assessment as a set of 
acceptable gambles A? We first define a price to be a constant gamble and iden- 
tify this constant gamble with its constant payoff value. Then we define the agent’s 
previsions as specific types of acceptable prices: 


e The lower prevision P(f) is the supremum acceptable buying price of f: 
P(f)=sup{veR: f-—ves}. 
e The upper prevision P (f) is the infimum acceptable selling price of f: 
P(f) = inf {e ER: « — f € &}. 
If & is coherent, then P and P are also called coherent. There is a conjugacy relation 
between coherent lower and upper previsions: P(f) = —P(— f). It allows us to 
work in terms of either type of prevision; we will mainly use the lower one. 


Incase P(f) = P(f), then P(f) = P(f) is the called the (precise) prevision of 
the gamble f. 


3.3.2 Probabilities as Previsions of Indicator Gambles 


Now that we have definitions for lower and upper previsions, we can derive probabil- 
ities from those. For classical probability, we have that the probability of an event—a 
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subset B of the possibility space X—is the prevision of the indicator for that event. 
For lower and upper previsions, we get: 


e The lower probability: P(B) = P(1g). 
e The upper probability: P(B) = P (1p). 


Notice that we reuse the same symbol for the prevision and probability functions, 
as is common in the literature. As long as the nature of the argument—gamble or 
event—is clear, this does not cause ambiguity. If P and P are coherent as previsions, 
then so are they as probabilities. Also the conjugacy relationship can be translated 
to coherent lower and upper probabilities; let B° = X \ B, then 


P(B) = P(1g) = PA — 1g) = —P(-1 + lace) = 1 — P(1g:) = 1 — P(B°). 


In case P(B) = P(B), then P(B) = P(B) is called the (precise) probability of 
B. 

To make the definitions for lower and upper previsions and probabilities concrete, 
let us apply them to our running example: 


Lower and upper probabilities for all events and agents 
We work out the calculation of Wiske’s lower probability that Belgium will win. 


Pw (BE) =Pw (gp) (def. lower probability) 


= sup {v ER: vE Ew} (def. lower prevision) 


l-v Aw + 
sup{veR:] i] =[_ 5 SY TMB] awo uae 2 0. su > of 


(write out natural extension Ew of Aw) 


sup {5+ Aw — pup : 1 — 5- Aw + Hir = Aw + Mae, Aw = 0, Mae = 0, ur > O} 
(eliminate v) 
sup {5 - Aw — Hp : Aw = §(1 + Hr — Mae), Aw = O, Hpg 2 0, Hi = 0} 


(solve constraint for Aw) 


= sup {3 = tuw = Š Use : 1 + Wre = Hepe» Hee Z O, Mar = o} (eliminate Aw) 


5 
= z (feasible solution ugg = 0, Up = 0 maximizes expression) 


Do the calculations also for the other agents and Japan. Then apply conjugacy to find 
the following table of lower and upper probabilities: 


Agents P(BE), P(BE) PQP), POP) Note 


Wiske 5/6,1 0,1/6 Will not bet against Belgium 
Yoko Tsuno 0,1/5 4/5,1 Will not bet against Japan 
Simple pool +00, —oo +oo,-—oo Sure loss, so absurd bounds 


Empty pool 0,1 0,1 So-called vacuous model 
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While in the above example the calculation of the lower prevision can be done by 
hand, in general it realistically requires a linear program solver. 


3.3.3 Assessments of Lower Previsions 


Up until now, we assumed a set of acceptable gambles A—an agent’s assessment— 
to be given. But often the agent will directly specify lower and upper probabilities 
or previsions, e.g., as bounds on precise probabilities and previsions. However, the 
coherence criteria and expression for the natural extension are based on having a 
set of acceptable gambles. In this section we will provide expressions based on an 
assessment specified as lower prevision values for gambles in a given set K. 

The approach is to derive an assessment as a set of acceptable gambles A from 
these lower prevision. Irrespective of what its natural extension & actually looks like, 
it follows from the definition of the lower prevision as a supremum acceptable buying 
price that 


0<sup{v-—P(f):vEeRAf—-veEDA} 
=supfk ER: f—(« +P(f)) €E} =sup{« eR: (f —P(f))-K EE}. 


This implies that f — P(f) + £ € & for any £ > 0, because of coherence. We cannot 
take £ = 0, because the corresponding so-called marginal gamble f — P( f) is not 
included in & in general, as the supremum value « = 0 is not necessarily attained 
inside the set. We therefore take A = U rex {f —P(f)+te:e> o}. 

We can then apply the theory described above to this assessment A. This leads to 
the following nontrivial results for a lower prevision P defined on a set of gambles K: 


e It avoids sure loss if and only if for all n > 0 and fp € K it holds that 


sup X (fe(x) — Po) = 0. 


xXEX ki 


e Itis coherent if and only if for all n,m > 0 and fg € K it holds that 


sup (> (A) — P) -m - (fo - cw) > 0. 


xEX \k=1 


e Its natural extension to any gamble f in £ is 


E(f) = sup lag [o -Y dg (Fe) -za n>, fk EK, dg > | . 


k=1 
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3.3.4 Working on Linear Spaces of Gambles 


The coherence criterion for lower previsions on an arbitrary space K of gambles we 
gave in the preceding section is quite involved. However, in case K is a linear space of 
gambles, this criterion becomes considerably simpler. Namely, a lower prevision P 
must then satisfy the following criteria for all gambles f and g in K and A > 0: 


Accepting sure gains P(f) > inf f, 
Super-linearity P(f + g) = P(f) + P(g), 
Positive homogeneity P(Af) =A- P(f). 


Expressed for upper previsions P, these coherence criteria are very similar: 


Accepting sure gains P(f) < sup f, 
Sub-linearity P(f +g) < P(f)+ P(g), 
Positive homogeneity P(Af) =à- P(f). 


From the coherence criteria, many useful properties can be derived for a coherent 
lower prevision P and its conjugate upper prevision P. We provide a number of key 
ones, which hold for all gambles f and g in K and u € R; P denotes either P or P: 


Upper dominates lower P(f) > P(f), 
Constants P(u) = u, 
Constant additivity P(f + u) = P(f)+u, 
Gamble dominance if f > g + u then P(f) > P(g) + n, 
Mixed sub/super-additivity P(f + g) < P(f)+ P(g) < P(f +8). 


3.4 Sets of Probabilities 


In Sect. 3.2 we modeled uncertainty using a set of acceptable gambles. In Sect. 3.3 
we showed how this can also be done in terms of lower or upper previsions (or 
probabilities). In this section, we add a third representation, one using credal sets— 
sets of precise previsions [17, §2.2.2], [18, §1.6.2]. In Sect. 3.4.1 we show how to 
derive the credal set corresponding to a given lower prevision. In Sect. 3.4.2 we go 
the other direction and show how to go from a credal set to lower prevision values 
[20, $3.3]. 


3.4.1 From Lower Previsions to Credal Sets 


A credal set is a subset of the set of all precise previsions P. (For possibility spaces B 
different from X, we write Pg.) This set is convex, meaning that any convex mixture 
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of precise previsions is again a precise prevision. Because of this, a gamble’s prevision 
is a linear function over this space. A lower—and upper—prevision can be seen 
as providing a bound on the value of the precise prevision for that gamble and 
thereby represent a linear constraint on the precise previsions. So the credal set M 
corresponding to a lower prevision P defined on a set of gambles K is the subset 
of P satisfying this constraint for all gambles in K: 


M=(){PeP:P(f) > PA}. 


SEK 


Being defined as such an intersection, such credal sets are closed and convex. 
The rationality criteria for a lower prevision P we encountered before can also 
be expressed using its corresponding credal set M: 


e P incurs sure loss if and only if M is equal to the empty set. 
e P is coherent if and only if all constraints are ‘tight’, i.e., if there exists a P in M 
such that P(f) = P(f) for all f in K. 


Let us make the concept of a credal set concrete using our running example: 


Yoko Tsuno’s credal set 


For a finite possibility space such as the one of our running example, a precise previ- 
sion P can be defined completely by the corresponding probability mass function p 
defined by p, = P({x}) for x in X = {BE, JP}. The set of all precise previsions 
can therefore be represented by the probability simplex—the set of all probability 
mass functions—on &. This set and the example probability mass function G, 5) 
is shown below left. Below right, we illustrate how Yoko Tsuno’s lower previ- 
sion Py (JP) = z generates the credal set My: The gamble 1, as a linear function 
over the simplex is shown as an inclined line. This linear relationship between p— 
equivalently, the corresponding prevision P,—and P, (1p) = P, (JP) transforms the 
bounds ¢ < P,(JP) < 1 into My. 


xP, (UP) = 0+ Prr + 1+ Pye 


14 
Py (sp) = 24 


P = (Prr, Pw) 


The set of extreme points MỌ of My as probability mass functions is [G 4), (0, 1) |. 
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3.4.2 From Credal Sets to Lower Previsions 


Now we assume that the agent’s credal set M is given. Most generally this can be any 
set of precise previsions, i.e., any subset of P. Often, to ensure equivalence between 
coherent lower previsions and non-empty credal sets, they are required to be closed 
and convex. In that case, a credal set is determined completely by its set of extreme 
points M* in the sense that all other elements are convex mixtures of these. 

To determine the lower prevision corresponding to any credal set, we determine 
its value for each gamble f of interest using the lower envelope theorem: 


P(f) = min {P,(f) : p € M} = min{Pp(f): pe M*}. 


Let us again use the running example to provide a feeling for what this all means: 


A credal set for the empty pool facing penalties 


Consider the empty pool. Because its assessment is empty, its credal set Mep is the 
trivial one corresponding to all probability mass functions on X = {BE, JP}. Now we 
add an extra element to the possibility space, ‘Penalties’. Below left we show Mep 
embedded in the corresponding larger probability simplex. Wiske and Yoko Tsuno 
decide to add the uniform probability mass function to it. Below right, you see the 
convex hull Mgup of this extra probability mass function and the original credal set. 


P(enalties) P(enalties) 


BE i Mep f JP BE JP 


(0,1,0) 


(1,0, 0) 


If we want to calculate lower and upper prevision values, we can here use the extreme 
point version of the lower and—similar—upper envelope theorem. For example, for 
the pool’s upper probability for Penalties: 


— — 1 
Pep (P) = Pep(1p) = max [pT (0,0, 1) : p € (0,0,0, 0.10, G5 9} t = 5. 
To make it explicit where this maximum is achieved, we above right show the line 


of probability mass functions p such that P, (P) = i. 
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3.5 Basics of Conditioning 


Conditioning an uncertainty model is the act of restricting attention to a subset B 
of the possibility space. It is often used to update an uncertainty model after having 
observed the event B [20, §6.1]. 

In the theory of imprecise probability, conditioning is a specific case of natural 
extension [17, §2.3.3], [20, §6.4.1]. In terms of acceptable gambles, conditioning 
on B corresponds to restricting the space of gambles to those that are zero outside B 
[18, $1.3.3]. For lower previsions, this translates to the following conditioning rule 
for all gambles f in £: 


inf if P(B) = 0, 
E(f lay ymsee S@) ioe 

max {u ER: P(lg(f — 4) =0)} if P(B) > 0. 
Conditioning a credal set M corresponds to taking the credal set M|B formed by 
conditioning each of the precise previsions in M: 


MIB = Pe ifJ3P € M: P(B)=0, 
— {PCIB): P€ M} ifVP € M: P(B) > 0. 


These rules based on natural extension give vacuous conditionals whenever the 
lower probability of the conditioning event is zero. Regular extension is a less impre- 
cise updating rule [17, §2.3.4], [18, §1.6.6], [20, App. J]: In credal set terms, it 
removes those precise previsions P such that P (B) = 0 from M. 

Let us apply the conditioning rules discussed here to our running example: 


Conditioning the empty-uniform pool’s credal set 


We condition the empty-uniform pool’s credal set on {JP, P}, i.e., Belgium not 
winning in regular time. Further down on the left, we show what happens if we 
apply natural extension: the conditional model is vacuous because P4 9,9) ({JP, P}) = 
Pa,o0,0 (lur) = C1, 9, 0)! (0, 1, 1) = 0. Further down on the right, we apply natu- 
ral extension and therefore remove P(,0,0) from Meup; this results in a non-vacuous 
conditional credal set. 
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P(enalties) P(enalties) 
Meup| {JP, P} 
Meryp| {JP, P} 
BE Z JP BE JP 
(1,0,0) (0, 1, 0) (0, 1, 0) 


3.6 Remarks About Infinite Possibility Spaces 


The theory we presented is also applicable to denumerable and continuous pos- 
sibility spaces with some technical amendments to the coherence criteria and by 
considering only bounded gambles. However, the running example was based on 
finite possibility spaces, so no feeling was created for applications with infinite pos- 
sibility spaces. Therefore we here give some remarks about imprecise probabilistic 
uncertainty models on continuous possibility spaces: 


They are mostly defined using credal sets whose extreme points are parametric 
distributions where the parameters vary ina set. A prime example are the imprecise 
Dirichlet model [21] and its generalizations [19]. 

They are also commonly defined using probability mass assignments to subsets of 
the possibility space. This is in some way a reduction to the finite case. Examples 
are belief functions [13, §5.2.1.1], some P-boxes [14, §4.6.4], and NPI models [5, 
§7.6]. 

Furthermore, models which bound some specific description of a precise prevision, 
such as cumulative distribution functions and probability density functions, are also 
popular in some domains. The extreme points of their credal set are, however, not 
known. General P-boxes [15] and lower and upper density functions [20, §4.6.3] 
are examples of this class. 

Calculating lower and upper previsions—i.e., performing natural extension—can 
easily become difficult optimization problems, so this should be a key considera- 
tion when choosing a specific type of model. 
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3.7 Conclusion 


This introduction to the theory of imprecise probability has prepared you for access- 
ing the broader literature on this topic and its applications. For those that wish to 
apply imprecise probabilistic techniques, this text only provides the first step: You 
should dive into the literature and contact experts to obtain the necessary knowledge 
and feedback. The references of this chapter and their authors or editors provide a 
starting point for that. 
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Chapter 4 ®) 
Imprecise Discrete-Time Markov Chains sv 


Gert de Cooman 


Abstract I present a short and easy introduction to a number of basic definitions 
and important results from the theory of imprecise Markov chains in discrete time, 
with a finite state space. The approach is intuitive and graphical. 


4.1 Introduction 


Although imprecision and robustness in discrete-time Markov chains were already 
studied in the 1990s [6-8], more significant progress [2, 3, 5, 11] could be made after 
the graphical structure of imprecise probability trees underlying them was uncovered 
in 2008 [4]. Research has now moved firmly into the continuous-time domain, for 
which [1, 9] are good starting points. 

In this paper, I give a concise and elementary overview of a number of basic 
ideas and results in discrete-time imprecise Markov chains, with an emphasis on 
their graphical representation. We begin with the basics of precise and imprecise 
probability models in Sects. 4.2 and 4.3. When such models are used in a dynami- 
cal context, precise and imprecise probability trees arise naturally; they and the use 
of the fundamental Law of Iterated Expectations for making inferences about them 
constitute the subjects of Sects. 4.4 and 4.5. Imprecise Markov chains correspond 
to special imprecise probability trees, and they and their basic inferences are dis- 
cussed in Sect. 4.6, followed by a number of examples in Sect. 4.7. These examples 
hint at stationary distributions and ergodicity. These notions are briefly discussed 
in Sect. 4.8, which concludes the paper. Throughout, I have included a number of 
simple exercises to illustrate the arguments in the main text. 
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4.2 Precise Probability Models 


Assume we are uncertain about the value that a variable X assumes in some finite 
set of possible values X. This is usually modelled by a probability mass function m 
on X, satisfying (Vx € ¥)m(x) > 0 and rex m(x) = 1. 

With m we can associate an expectation operator Em as follows 


En(f) := 5 m(x) f(x) where f: X > R. 


xEX 


If AC X is an event, then its probability is given by P,,(A) = ae m(x) = 
Em(,4), where I4: X — R is the indicator of A and assumes the value 1 on A 
and 0 elsewhere. This tells us that there are two equivalent mathematical languages 
for dealing with uncertainty: the language of probabilities and the language of expec- 
tations, and that we can go freely from one to the other. 

All possible (precise) probability models are gathered in the simplex Xx of all 
mass functions on V: Ly := {m e RY: (Vx € X)m(x) > 0 and Pex m(x) = 1}. 
Any probability model for uncertainty about X is a point in that simplex, which indi- 
cated that mass functions have a geometrical interpretation. This is illustrated below 
for the case X = {a, b, c} and the uniform mass function mu. 


Èx E(2hp; — lie) =0 


b a 


E(la)) = 1/2 


Expectation also has a geometrical interpretation: specifying a value E(f) for the 
expectation of a map f: ¥ > R, namely, X ex m(x) f(x) = E(f), imposes a 
linear constraint on the possible values for m in Xx. It corresponds to intersecting 
the simplex Xx with a hyperplane, whose direction depends on f. This is also 
illustrated in the picture above; in this particular case two assessments turn out to 
completely determine a unique mass function. 


4.3 Imprecise Probability Models 


We now turn to a generalisation of precise probability models, which we will call 
imprecise. To allow for more realistic and flexible assessments, we can envisage 
imposing linear inequality—rather than equality—constraints on the m in Xx: 


E(f) < mofa or X mfa) < ES). 


xEX xEX 
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This corresponds to intersecting & with affine semi-spaces: 


A,A, A, A, A 
a b a b a b a b a b 


Any such number of assessments leads to a credal set M, which is our first type of 
imprecise probability model. 


Definition 4.1 A credal set M is a convex closed subset of Uy. 


Below, we show some more examples of such credal sets in the special case ¥ = 
{a, b, c}. The credal set on the left corresponds to the assessment: ‘b is at least as 
likely as c’; the one in the middle is a convex mixture of the uniform mass function 
with the entire simplex; and the one on the right represents a statement in classical 
propositional logic: ‘X = a or X = c’. This illustrates that the language of credal 
sets encompasses both precise probabilities and classical propositional logic. 


Cc C c 


a b a b a b 


Lower and upper expectations are our second type of imprecise probability model. 
To see how they come about, consider the credal set in the figure below on the right. 


c 


2x i Ellet 
E(Ke}) = "/4 


a b 


We can ask what we know about the probability of c, or the expectation of He}, 
given this credal set: it is only known to belong to the closed interval [1/4, 4/7]. This 
can be generalised from events to arbitrary elements of the set L(V) = R* of all 
real-valued maps f on ¥: As m ranges over the credal set M, Em(f) will similarly 
range over a closed interval that is completely determined by its lower and upper 
bounds. 

This leads to the definition of the following two real functionals on L(Y): 


Em(f) = min {E,,(f): m E€ M} lower expectation 
= forall f: ¥ > R. 
E m(f) = max {Em( f): m E€ M} upper expectation 


Observe that these lower and upper expectations are mathematically equivalent mod- 
els, because 
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Em(f) =—Em(—f) forall f € £(4). 
We will in what follows focus on upper expectations. 


Exercise 4.1 What is the upper expectation E m when M = Dx? 


c 


Ix _ 
Solution: E m(f) = max f. 


This shows that we can go from the language of probabilities—and the use of MM— 
to the language of expectations—and the use of E m. To see that we can also go the 
other way, we need the following definition: 


Definition 4.2 We call areal functional E on £L(4’) an upper expectation if it satisfies 
the following properties: for all f and g in £(¥) and all real A > 0: 


1. E(f) < max f [boundedness], 
2. E(f +g) < E(f) + E(g) [sub-additivity]; 
3. EAF) = E(f) [non-negative homogeneity]. 


Upper expectations are also called coherent upper previsions [10, 12]. They constitute 
a model that is mathematically equivalent to credal sets, in very much the same way 
as expectations are mathematically equivalent to probability mass functions: 


Theorem 4.1 A real functional E is an upper expectation if and only if it is the 
upper envelope of some credal set M. 


Proof Use M = {m € Ex: Yf € L(X))(Em(f) < E(f))}.- 


Exercise 4.2 Consider any linear prevision E,,, and any € € [0, 1]. Verify that the 
so-called linear-vacuous mixture: 
E 


/ N E(f) = (1 - €)Em(f) + € max f 


a b 


is an upper expectation. 

Solution: Em and max are upper expectations by Theorem 4.1, because they are upper 
envelopes of the respective credal sets {m} and X y—see Exercise 4.1. Now verify 
that being an upper expectation is preserved by taking convex mixtures. The corre- 
sponding credal set (1 — e){m} + €ÈEx := {(1 — e)m + €q: q € Xx} is indicated 
in blue in the figure above. © 
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Exercise 4.3 All upper expectations on a binary space ¥ = 
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{0, 1} are such linear- 


vacuous mixtures, and the corresponding credal set can be depicted as 


p 
0 


Let p := m(1) and q := 1 — p 
p,e? 


= m(0). What is the relation between [p, 


1 


p] and 


Solution: p = E(lņ) = (1 — e)p = p — ep and p= E([y)) =(1-e)pte= 


p + €q. Hence, p — p = €. 


© 


4.4 Discrete-Time Uncertain Processes 


We now apply these ideas in a more dynamic context: the study of processes. We 
consider an uncertain process, which is a collection of uncertain variables X1, X2, ..., 
Xn, -.. assuming values in some finite set of states X. This can be represented graphi- 
cally by a standard event tree with nodes (also called situations) s = (x1, X2, ..., Xn) 
for xk € X and n > 0. This is depicted below on the left for the special case that 

= {0, 1}, where we have limited ourselves to three variables X1, X2, and X3; but 


the idea should be clear. Observe that we use the symbol 


or root node, of the event tree. 


(0, 0, 0) 


0,0) eon 
oD = 
— 0,1,1 
: a, p 
— a0, 
i (1,0) 
x! a 1,1) 


x 


for the initial situation, 


(0, 0) pm) a 


1 
0 mo oom 


(0, 1,0 

0,1 

( To 
(1, 0, 0) 


(1, 0) Ta, o) 


—- 
1 XJmı 


1,1,0 
d, pm l 


D 


The event tree becomes a probability tree as soon as we attach to each node 


S = (X1, X2, ..., 


Xn) a local probability mass function m, on XY with associated 


expectation operator Em,, expressing the uncertainty about the next variable X„+1ı 


after observing the earlier variables X; = x1, ..., 
on the right for the special case that ¥ = 


(0, 1}. 


Xn = Xn. This is depicted above 


We now consider a very general inference problem in such a probability tree. 
Consider any function g: X” — R of the first n variables: g = g(X), X2,..., Xj). 


We want to calculate its expectation E (g|s) in the situation s = (x,..., 


xx), that is, 


after having observed the first k variables. Interestingly, this can be done efficiently 
using the following theorem, which is a reformulation of the Law of Total Probability: 
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Theorem 4.2 (Law of Iterated Expectations) Zf we know E(g|s, x) for all x € X, 
then we can calculate E(g|s) by backwards recursion using the local model ms: 


E(g|s) = Em (E(gls,-)) = X ms (x) E(g|s, x). 


local xE 


This shows that expectations can be calculated recursively using a very basic step, 
illustrated below for the case ¥ = {0, 1}: 


(s, D <— E(gls, 1) 
E(g\s) = ms(1)E(gls, 1) + ms O)E(gls, 0) <— s Lis 
(s, 0) <— E(g|s, 0) 


Hence, all expectations E(g|x,,..., x,) in the tree can be calculated from the local 
models m, as follows: 


1. start in the final cut X” and let E(g|x,, x2,...,X%n) = 8 (x1, X2, <- <, Xn); 
2. do backwards recursion using the Law of Iterated Expectations: 


E(g|x1, PE Xx) = Eine...) E elx, vee Xk, -)) 
G 


local 


3. go on until you get to the root node L, where we can identify E (g|O) = E (g). 


Exercise 4.4 Consider flipping a coin twice independently, with probability p for 
heads—outcome 1—and q = 1 — p for tails—outcome 0. The corresponding prob- 
ability tree for this experiment is given below on the left, with, in red, in the nodes, 
the corresponding number of heads. What is the expected number of heads? 


q 
repite 


af Fei 4/ | 
A 
T 


2p=p+(p-1+4-0) 
1 


PN ee 


l+p=2p+q=p-:2+¢q-1¢ 1 


=x 


Solution: Above on the right, we apply the Law of Iterated Expectations recursively, 
from leaves to root; the solution is the expectation 2p attached to the root. © 


2 


Exercise 4.5 Extend the ideas in the solution to Exercise 4.4 to calculate the expected 
number of heads when the coin is flipped n times independently. 

Solution: We apply the Law of Iterated Expectations recursively, from leaves to root. 
Below on the left, we consider starting from the leaves of the tree at depth n; applying 
the Law reduces to adding p to the number of heads in each of their parent nodes at 
depth n — 1. On the right, we apply the Law to these nodes at depth n — 1, which 
reduces to adding 2p to the number of heads in each of their parent nodes at depth 
n—2., 
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at time n — 1 in a situation with k heads at time n — 2 ina situation with k heads 


wee se 
Petree ed ae a dae a, 
2 BO pak k+p+1 


n—-1 i n—2 n—1 


Going on in this way, we see that the solution is the expectation np attached to the 
root at depth 0. © 


Exercise 4.6 We now flip the same coin time and time again, independently, until 
we reach heads for the first time. Calculate the expected number of coin flips. 
Solution: Below is the (unbounded) probability tree associated with this experiment. 


p! 
peee 
æ 4 p (0, 1) 


=— >a+1p 
FT 0.0) 


(0, 0, 1) 


depth 1 


Call the unknown expectation œ. We apply the Law of Iterated Expectations to 
the situations at depth 1. In the situation 1, the expected number of heads is 1, the 
actual number of heads there. In the situation 0, we see a copy of the original tree 
extending to the right, but since we have already flipped the coin once here, the 
expected number of heads in this situation is w + 1. In the parent node, the expected 
number of heads a is therefore also given by p- 1 + q - (& + 1) = 1 + qa, whence 
a = Yp Q 


4.5 Imprecise Probability Trees 


Until now, we have assumed that we have sufficient information in order to specify, 
in each node s, a local probability mass function m, on the set ¥ of possible values 
for the next state. 


(s, 1) 


(s, 0) 


(s, 1) 


> a 


(s, 0) 


We now let go of this major restrictive assumption by allowing for more general 
uncertainty models. We will consider credal sets as our more general local uncertainty 
models: closed convex subsets Ms of Xx. See the figure below for a special case 
when ¥ = {0, 1}. 


Definition 4.3 An imprecise probability tree is an event tree where in each node s 
the local uncertainty model is a credal set M,, or equivalently, its associated upper 
expectation E,, with E,(f) := max {E,,(f): m € Ms} for all f € £L(X). 
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An imprecise probability tree can be interpreted as an infinity of compatible precise 
probability trees: choose in each node s a probability mass function m, from the 


set M,. 
(0, 0, 0) 


(0, 0) Mog 


0, 0, 1) 
0JMo 
(0, 1, 0) 


0,1) | 
Mop 1, 1) 


(1, 0, 0) 


(1, 0) [Hay 


1,0, 1) 


1JMı 
(1,1,0) 


a, 1) [May 


(i, 1, 1) 


For each real map g = g(Xj,..., Xn), each node s = (x1, ..., Xg), and each such 
compatible precise probability tree, we can calculate the expectation E (g|x1, ..., Xk) 
using the backwards recursion method described before. By varying over each com- 
patible probability tree, we get a closed real interval, completely characterised by 
lower and upper expectations E (g|x1, ... Xx) and E(g|x1, 2 XE) LE (g(x, ---, Xk), 
E(g|x, ...,%,)]. The complexity of calculating these bounds in this way is clearly 
exponential in the number of time steps n. But, there is a more efficient method to 
calculate them: 


Theorem 4.3 (Law of Iterated Upper Expectations [4, 5]) Jf we know E(g\s, x) for 
all x € X, then we can calculate E(g|s) by backwards recursion using the local 
model E: 


E = E, (E(g\s,-)) = E ; 
(sls) = E, (E(gls,)) = max /m(x) E(gls, x) 

local xeX 
This shows that expectations can be calculated recursively using a very basic step, 
illustrated below for the case ¥ = {0, 1}: 


7 o (s, 1) <— E (gls, 1) 
E(g\s) = E, Ells, )) — s Is = 
(s, 0) <— E(g|s, 0) 


The method for, and the complexity of, calculating the E(g|s), as a function of n, is 
therefore essentially the same as in the precise case! 


Exercise 4.7 Extend the ideas in the solution to Exercise 4.5 to calculate the upper 
expected number of heads when the coin is flipped n times independently, but where 
now we have an imprecise probability model for a coin flip, with a probability interval 
[p, p] for heads, and a corresponding interval [¢, g] = [1 — p, 1 — p] for tails. 

Solution: We apply the Law of Iterated Upper Expectations recursively, from leaves 
to root. Below on the left, we consider starting from the leaves of the tree at depth n; 
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applying the Law reduces to adding Pp to the number of heads in each of their parent 
nodes at depth n — 1. 


at time n — 1 ina situation with k heads at time n — 2 in a situation with k heads 
ute T 
k&>k+P = k+ Ely) ko k+2p=k+D+ Ep 
L : lo, 
L DI k+1 OT kt p+ 


eae aes | 
n—1 n nos n—-1 


On the right, we apply the Law to these nodes at depth n — 1, which reduces to 
adding 2p to the number of heads in each of their parent nodes at depth n — 2. Going 
on in this way, we see that the solution is the expectation np attached to the root at 
depth 0. A similar result holds for the lower expectation. © 


Exercise 4.8 We now flip the same coin with the imprecise probability model time 
and time again, independently, until we reach heads for the first time. Calculate the 
upper expected number of coin flips. 

Solution: Below is the (unbounded) probability tree associated with this experiment. 


YP 1 


P) 
ai wP 01) 


Loa] P 


ee a O ee 


2 g~ (0,0) 


l2 


Call the unknown upper expectation œ. We apply the Law of Iterated Upper Expecta- 
tions to the situations at depth 1. In the situation 1, the upper expected number of heads 
is 1, the actual number of heads there. In the situation 0, we see a copy of the original 
tree extending to the right, but since we have already flipped the coin once here, the 
upper expected number of heads in this situation is œ + 1. In the parent node, the 
upper expected number of heads a is therefore also given by 1 + E (@ lo) = 1 + &q, 
whence œ = !/p. A similar result holds for the lower expectation. O 


The attentive reader will have observed that in all these simple exercises, we can 
also obtain the ‘imprecise’ result from the ’precise’ one by optimising over the 
single parameter p. We have to warn against too much optimism: in more involved 
examples, this will no longer be the case. 


4.6 Imprecise Markov Chains 


We now look at a special instance of a probability tree, corresponding to a stationary 
(precise) Markov chain. This happens when the precise local models m(,,___ x) only 


agate 


60 G. de Cooman 


depend on the last observed state x,— this is the Markov Condition—and also do not 
depend explicitly on the time step n: 


M(x, Xn) = qC |Xn) 
for some family of transition mass functions q(-|x),x € X. 


Definition 4.4 The uncertain process is a stationary precise Markov chain when all 
Ms are singletons {ms} and M(,,,..x,) = {4C |Xn)}, for some family of transition 
mass functions q (|x), x € ¥. 


gasi 


For each x € 4’, the transition mass function q (-|x) corresponds to an expectation 
operator, given by E(f |x) = ya q(z|x) f(z) for all f € L(¥). 


Definition 4.5 Consider the linear transformation T of £(V), called transition oper- 
ator: T: L(X) > L(X): f => Tf, where Tf is the real map defined by: 


Tf (x) = E(flx) = $ glx) f@ forall x € X. 


ZEX 


In the parlance of linear algebra, or functional analysis, T is the dual of the linear 
transformation with Markov matrix M with elements M,, := q(y|x). 

Up to now, we have mainly been concerned with conditional expectations of the 
type E(-|s). We will now look at particular unconditional expectations, where s = 
For any n > 0, we define the expectation for the (single) state X, at time n by 


and we denote the corresponding mass function by m,. Applying the Law of Iterated 
Expectations in Theorem 4.2 now yields, with also E1 = Emp and mı = mp: 


E,(f) = E (T™! f), and dually, m, = M"~'m,, 


so the complexity of calculating E,,(f) is linear in the number of time steps n. 


Exercise 4.9 Consider the stochastic process where we first flip a fair coin. From 
then on, on heads, we select a biased coin with probability p for heads for the next 
coin flip, and on tails, a biased coin with probability q = 1 — p for heads, and keep 
on flipping one of the two biased coins, selected on the basis of the outcome of the 
previous coin flip. This produces a Markov chain. Find T f, T? f, and E;(f), E2(f) 
and £3(f) for f € £({0, 1}). 

Solution: Clearly, E1 (f) = Y2f (1) + '/2f 0), Tf) = E(f10) = 4f (1) + pf O) 
and Tf (1) = E(f|0) = pf (1) + qf (0), whence 


1 1 
Exp) = Bf) =? w S42 7@ = 50) + 5 FO. 
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Similarly, 


T’ f (0) = qT f (1) + pT f) = glpf() + 4f] + plaf) + pf] 
= (pP +4) fO) + 2pqf (1) 
T fO) = (pP +97) f(D + 2pqf (0), 


whence 


P +4" +2pq 


E3(f) = E\(T’ f) = 3 


f4 


2p +p +a n 1 1 
7 fO zI ® H zf 0. 


and so on. We see that at the level of expectations of single state variables, the process 
cannot be distinguished from flipping a fair coin. O 


The generalisation from precise to imprecise Markov chains goes as follows: 


Definition 4.6 The uncertain process is a stationary imprecise Markov chain when 
the Markov Condition is satisfied with stationarity: M cx ,..x,) = Q(-|x,) for some 
family of transition credal sets Q(-|x), x € X. 


ag (0, 0, 0) 45; (0, 0, 0) 
(0, 0) =[¢(-|0) (0, 0) ={ Q(-|0) 
(0, 0, 1) (0, 0, 1) 
0 Jq¢|0) 0} O(-|0) 
(0, 1, 0) (0, 1,0) 
(0, 1) CID (0, 1) J Q¢-|1) 
qx 0, 1,1) Fey iW) 
m > M 
(1, 0, 0) (1, 0, 0) 
(1, 0) IJa (10) (1, 0) T210) 
Jx (1,0, 1) sccm 1) 
In {ein 
aa 1,0) +o 1,0) 
(1, 1) J¢Cl)) d, 1) J 2(11) 
Guim Cy 


An imprecise Markov chain can be seen as an infinity of (precise) probability trees: 
choose a precise mass function from M, in each situation s. It should be clear that 
not all of these satisfy the Markov property or stationarity. This implies that solving 
the optimisation problem in order to find the tight upper bounds E(g|s), as discussed 
in Sect. 4.5, is not (necessary always) simply an optimisation over a parametrised 
collection of stationary (or even non-stationary) Markov chains, although it can turn 
out be so simple in a number of special cases. 

For each x € X, the local transition model Q(-|x) corresponds to an upper expec- 
tation operator E(-|x), with E(f |x) = max [E (f): pE Q(-|x)} forall f € L(¥). 
This leads to the following definition, which generalises the definition of transition 
operators for precise Markov chains: 


Definition 4.7 Consider the non-linear transformation T of L(4), called the upper 
transition operator: T: L(X) > L(&): f +> Tf where the real map T f is defined 
by Tf (x) := E(f|x) = max [E (f): pE Q(-|x)} for all x € X. 


62 G. de Cooman 


For any n > 0, we define the upper expectation for the (single) state X, at time n by 


Ef) = E(f(Xn)) = E(f(X,)| ) for all f: X >R. 


Then the Law of Iterated Upper Expectations of Theorem 4.3 yields, with also E; = 
E Mn: 


E, (f) = ET! f) for alln > 1 and all f € L(X), 


so the complexity of calculating E,,(f) is still linear in the number of time steps n. 


4.7 Examples 


Consider a two-element state space Æ = {1,0}, with upper expectation E; = 
Emn for the first variable, and for each (x;,...,x%,) € {1,0}", with 0 < € < 
1, Mæ = Mx, = A — ©O{¢¢lxn)} + € X10), or equivalently, for the upper 
transition operator T = (1 — €)T + € max. In other words, each transition credal 
set O(-|x) is a linear-vacuous mixture (see Exercise 4.2, also for the notations used) 
centred on the transition mass function g(-|x), where the mixture coefficient € is the 
same in each state x. 

It is a matter of simple and direct verification that forn > land f € L(X):T" f = 
(_—e)"’T"f+e Ka 1 — e)‘ max T* f , and therefore, using the Law of Iterated 
Expectations, E,,41(f)=E (TfA=a- ©" E; (T”f)+e€ yoia — €)* max Ts, 
If we now letn — œ, it is not too hard to see that the limit exists and is independent 
of the initial upper expectation E1: 


lim E,(f) =€ xe —«e)‘ max TÝ f for all f € L(Y). 
k=0 


We consider two special cases: 


1. Contaminated random walk: when T f (1)=T f (0) = !/2[ f (1) + f(0)], the under- 
lying precise Markov chain is actually like flipping a fair coin. We then find that 
E.(f) = A —©)'[f (1) + f(0)] + € max f forall f € L(X). 

2. Contaminated cycle: when Tf(1) = f(0) and Tf(O) = f(1), the underlying 
precise Markov chain is actually like deterministic cycle between the states 0 
and 1. We then find that E(f) = max f forall f € £(X). 


The probability intervals for 1 corresponding to these two limit models are given by 
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As another example, we consider ¥ = {a, b, c} and the transition models depicted 
below, which are imprecise models not very far from a simple cycle: 


Cc Cc C 
Cc 
A A A i, 
a Q(-|1) b a Q(-|0) b a Q(-|c) b 
Below, we depict the time evolution of the E,, (as credal sets) for three cases (red, 


yellow and blue). We see that, here too, regardless of the initial distribution Ej, the 
distribution E„ seems to converge to the same distribution. 


4.8 A Non-linear Perron—Frobenius Theorem, 
and Ergodicity 


The convergence behaviour in the previous examples can also be observed in general 
imprecise Markov chains under fairly weak conditions. The following theorems can 
be derived from the more general discussions and results in [3, 5]. 


Theorem 4.4 Consider a stationary imprecise Markov chain with finite state set X 
and upper transition operator T. Suppose that T is regular, meaning that there is 
some n > 0 such that min T" ha > 0 for all x € X. Then for every initial upper 
expectation E}, the upper expectation E, = E; o T"! for the state at time n con- 
verges point-wise to the same stationary upper expectation E»: liMy+oo En(h) = 
lim, Ey(T"'h) := E% (h) forall h in L(&). The limit upper expectation E» is 
the only T-invariant upper expectation on L(X), meaning that E% = E% oT. 


In that case we also have an interesting ergodicity result. For a detailed description 
of the notion of ‘almost surely’, we refer to [3], but it roughly means ‘with upper 
probability one’. 
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Theorem 4.5 Consider a stationary imprecise Markov chain with finite state set X 
and upper transition operator T. Suppose that T is regular with stationary upper 
expectation E ». Then, almost surely, for all h in L(X): 


ly i z 
Eno(h) < liminf — $ 1 h(Xx) < lim sup = $ 1 h(Xx) < Eno (h). 


k=l noo kel 


4.9 Conclusion 


The discussion in this paper lays bare a few interesting but quite basic aspects of 
inference in imprecise probability trees and Markov chains in discrete time. A more 
general and deeper treatment of these matters can be found in [3-5]. For recent work 
on imprecise Markov chains in continuous time, I refer the interested reader to [1, 9]. 
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Chapter 5 A) 
Statistics with Imprecise get 
Probabilities—A Short Survey 


Thomas Augustin 


Abstract This chapter aims at surveying and highlighting in an introductory way 
some challenges and big opportunities a paradigmatic shift to imprecise probabili- 
ties could induce in statistical modelling. Working with an informal understanding 
of imprecise probabilities, we discuss the concepts of model imprecision and data 
imprecision as the two main types of imprecision in statistical modelling. Then we 
provide a short survey of some major developments, methodological questions and 
applications of imprecise probabilistic models under model imprecision in the con- 
text of different inference schools and summarize some recent developments in the 
area of data imprecision. 


5.1 Introduction 


By promising powerful solutions to some of the deepest foundational problems of 
probability and statistics, imprecise probabilities offer great opportunities also for 
the area of statistical modelling. Still, in statistics, theories of imprecise probabili- 
ties live in the shadows, and admittedly the development of many of the imprecise 
probability-based methods is often in a comparatively early stage. Nevertheless, in 
all areas of statistics, the desire for a more comprehensive modelling of complex 
uncertainty had popped up again and again. The rather scattered work covers a huge 
variety of methods and topics, ranging from contributions to the methodological and 
philosophical foundations of inference to very concrete questions of applications. 


T. Augustin (BJ) 

Department of Statistics, Ludwig-Maximilians-Universitat München (LMU Munich), 
Munich, Germany 

e-mail: augustin @ stat.uni-muenchen.de 


© The Author(s) 2022 67 
L. Aslett et al. (eds.), Uncertainty in Engineering, 

SpringerBriefs in Statistics, 

https://doi.org/10.1007/978-3-030-83640-5_5 


68 T. Augustin 


The present chapter aims at providing a rough and informal survey of some of the 
major questions and developments.! It is structured as follows. Section 5.2 collects 
the basic concepts that are needed later on. Then, Sect. 5.3 looks at the major sources 
of imprecision in statistical modelling. We distinguish there between several types 
of data imprecision and of model imprecision. Section5.4 focuses on the issue of 
model imprecision and discusses it from the angle of different inference schools. We 
put some emphasis on the (generalized) frequentist and Bayesian setting, but also 
briefly adopt other perspectives. In Sect.5.5, approaches to handle so-called ontic 
and epistemic data imprecision, respectively, are surveyed. Section 5.6 is reserved 
for some concluding remarks. 


5.2 Some Elementary Background on Imprecise 
Probabilities 


In this section, we briefly summarize the concepts in the background. With respect 
to the basic notions and the technical framework for statistical inference, we refer to 
the first chapter of this book [41]. We rely on the same basic setting, where we use 
observations (data) on some underlying stochastic phenomenon? to learn character- 
istics of that mechanism, mathematically described by an unknown parameter of the 
underlaid probability model. 

With respect to imprecise probabilities, a very rough and eclectic understanding 
shall be sufficient to read this chapter.* It is not necessary, for our aims here, to dis- 
tinguish different approaches with respect to many technical details. Thus, in a rather 
inclusive manner, we subsume here under imprecise probabilities any approach that 
replaces in its modelling precise, traditional probabilities p(-) by non-empty sets P 
of precise probabilities as the basic modelling entity, including also all approaches 
that can be equivalently transferred into a set of precise probability. This comprises 
approaches directly working with sets of probabilities, like robust Bayes analysis (see 
Sect. 5.4.2), Kofler & Menges’ linear partial information (e.g. [42]), Levi’s approach 
to epistemology (e.g. [45]), as well as the whole bunch of corresponding approaches 
based on non-linear functionals and non-additive set-functions, covering lower and 
upper previsions in tradition of Walley’s book [66], interval probabilities building on 


l A longer survey discussing the state of the art of statistical inference with imprecise probabilities 
at that time is aimed at by [8]. 

2 To avoid any commitment to a certain interpretation of probability, we use the term “stochastic 
phenomenon” as a kind of neutral, superordinate concept, in particular including approaches refer- 
ring to random phenomena only as well as approaches addressing generally situations of epistemic 
uncertainty. 

3 An introduction into imprecise probabilities on an intermediate level is aimed at by [4]; see also 
[57] (in this volume) for an introduction, [15] for a survey by a philosopher and [9] for a review with 
an engineering background. Current developments in research on imprecise probabilities are mostly 
discussed at the biannual ISIPTA (International Symposium on Imprecise Probabilities: Theories 
and Applications) meetings; see [1, 5, 27] for the most recent ones. 
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Weichselberger ([72]*), probabilistically interpretable approaches based on capaci- 
ties including the suitable branch of Dempster-Shafer theory (e.g. [29]), random sets 
(e.g. [12, Chap. 3]) and p-boxes following [32]. Moreover, there is a smooth transition 
to several approaches propagating systematic sensitivity analysis (e.g. [52]). 

It is important that our basic entity, the set P, has to be understood and treated as 
an entity of its own. It is by no means possible to distinguish certain of its elements 
as more likely than others or to mix its elements, eventually leading to a precise 
traditional probability distribution. P may be assessed directly by collecting several 
precise models, so-to-say as possible worlds / expert opinions, to be considered. More 
often P is constructed, typically as the set of all probability distributions 


e that respect bounds on the probabilities of certain classes of events or, more gen- 
erally, the expectations of certain random variables, ° 

e or that are (in a topologically well-defined sense) close to a certain precise prob- 
ability distribution po(-) (neighbourhood models), providing a formal framework 
for expressing statements like “po(-) is approximately true”,’ 

e or that are described by a parameter varying in an interval/cuboid (parametrically 


constructed models), like “imprecise versions” of a normal distribution.’ 


5.3 Types of Imprecision in Statistical Modelling 


There are many situations in statistics where imprecision occurs, i.e. where a careful 
modelling should go beyond the idealized situation of perfect stochasticity and data 
observed without any error and in an ideal precision. To study these situations further, 
an ideal-typical distinction between model and data precision is helpful. 

Model Imprecision has to be taken into account whenever there is doubt in a con- 
crete application that the strong requirements (perfect precision and, by the additivity 
axiom, absolute internal consistency) the traditional concept of probability calls for 
can be realistically satisfied. This includes all situations of incomplete or conflicting 
information, robustness concerns, and repeated sampling from a population where 
the common assumption of i.i.d. repetitions is violated by some unobserved hetero- 
geneity or hidden dependence. In a Bayesian setting, in addition, quite often, if at 
all, a precise prior distribution is not honestly deducible from the knowledge actually 
available. Ellsberg’s [31] seminal thought experiments on urns with only partially 
known compositions have made it crystal clear that the amount of ambiguity, i.e. the 


4 See also [73] for a summary in English language. 
5 For instance, in the approaches founded by [66, 72], respectively. 


© It may be explicitly noted that the latter also includes comparative/ordinal probabilities (e.g. [49]), 
only ordering the probability of certain events like “A is more likely to occur than B, and B more 
likely than C’. 


7 See Sect. 5.4.1. 
8 See, for instance, the sets of conjugated distributions in Sect. 5.4.2 for an example. 
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uncertainty about the underlying stochastic mechanism, plays a constitutive role in 
decision making and thus has to modelled carefully. 

Data Imprecision comprises all situations where the observations are not avail- 
able in the granularity that is originally intended in the corresponding application. 
Following [26], for the modelling of data imprecision, it is crucial to distinguish 
two situations, the precise observation of something inherently imprecise (ontic data 
imprecision) and the imprecise observation of something precise (epistemic data 
imprecision).? The difference between these two may be explained by a pre-election 
study where some voters are still undecided which single party they will vote for. 
(Compare [55], and for more details [44, 53]). If we understand a voting preference 
like “I am still undecided between parties A and B” as a political position of its own, 
then we interpret the information as an instance of ontic imprecision. If we focus 
on the forthcoming election and take this information as an imprecise observation 
allowing us to predict that the voting decision on the election day will either be 
A or B, then we are coping with epistemic imprecision. In particular in engineer- 
ing, another frequent example of epistemic data imprecision occurs from insufficient 
measurement precision, where intervals instead of precise values are observed. A 
contrasting example where (unions of) intervals are to be understood as an entity of 
their own arises when the time span of certain spells is characterized: “This machine 
was under full load from November 10th to December 23rd.” 

Epistemic imprecision very naturally occurs in many studies on dynamic pro- 
cesses, from socio-economic over medical to technical studies. There, censoring is 
always a big issue: the spells of some units are still unfinished when the study ends, 
providing only lower bounds on the spell duration. Typical examples include the 
duration of unemployment, the time to recurrence of a tumour or the lifetime of an 
electronic component. In addition, also interval censoring is quite common, where 
one only learns that the event of interest occurred within a certain time span. It should 
be noted explicitly that missing data, a frequent problem for instance in almost every 
social survey, may be comprised under this setting, taking the whole sample space 
as the observation for those units missing. 


5.4 Statistical Modelling Under Model Imprecision 


Of course, there are also strong reservations against imprecise probabilities in tra- 
ditional statistics. For a traditional statistician, imprecise probabilities are just a 
superfluous complication, misunderstanding either the generality of the concept of 
uncertainty or the reductionist essence of the modelling/abstraction process. Indeed, 
for an orthodox Bayesian, all kinds of not-knowing are simply situations under uncer- 
tainty, and any kind of uncertainty is eo ipso expressible by traditional probabilities. 
From the modelling perspective, imprecision is taken as part of the residual category 


? See also the distinction between the disjunctive and conjunctive interpretation of random sets, as 
discussed, e.g. in [30, Sect. 1.4]. 
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that is naturally lost when abstracting and building models.!° Box (and Draper)’s 
often cited dictum “Essentially, all models are wrong, but some of them are useful” 
[14, p. 424] has generally been (mis)understood as a justification to base details of 
the model choice on mathematical convenience and as an irrevocable argument to 
take model imprecision as negligible. 


5.4.1 Probabilistic Assumptions on the Sampling Model 
Matter: Frequentist Statistics and Imprecise 
Probabilities 


Indeed, however, Box’s quotation could be continued by “...and some models are 
dangerous”, since the general neglecting of model imprecision implicitly presup- 
poses a continuity (in an informal sense) of the conclusions in the models that by 
no means can be taken as granted. A well-known example from robust statistics!! 
is statistical inference from a “regular bell-shaped distribution”. The standard pro- 
ceeding would be to assume a normal distribution. But, for instance, the density of 
a Cauchy distribution is phenomenologically de facto almost indistinguishable from 
the density function of a normal distribution, suggesting both models to be equivalent 
from a practical point of view. However, statistical inference based on the sample 
mean shows fundamentally different behaviours. '* Under normality, the distribution 
of the sample mean behaves nicely, contracting itself around the correct value if the 
sample size n increases. In the case of the Cauchy distribution, however, the distri- 
bution of the sample mean stays the same irrespective of the sample size,!? making 
any learning by the sample mean impossible. 

This shocking insight—optimal statistical procedures may behave disastrously 
even under “tiny deviations” from the ideal model—demonstrates that imprecision 
in the underlying model may matter substantially. In this context, the theory of (fre- 
quentist) robust statistics as the theory of approximately true models emerged, and 
imprecise probabilities provide a natural superstructure upon it (see, e.g. [38] for 
historical connections). In particular, neighbourhood models, also briefly mentioned 
above, have become attractive.'t Building on an influential result by Huber and 
Strassen [39], a comprehensive theory of testing in situations where the hypotheses 


10 This is particular the case when the prescriptive character of models is emphasized over its 
descriptive role (cf., e.g. [13, p. 99]). 


1! See also [8, Sect. 7.5.1, in particular, Fig. 7.3] to illustrate this. 


12 Generally, if one looks at an i.i.d. sample X1, . . . , Xn from a normal distribution with parameters 
u and o”, denoted by N(u, o’), and an i.i.d. sample Z1, ..., Zn from a Cauchy distribution with 
parameter œ and f, denoted by C(«, £), respectively, one obtains for the sample means X := 


1 ry Xi ~ N(u, dj and Z := 1 "1 Zi ~ C(a, B). 
13 The Law of Large Numbers is not applicable to the Cauchy distribution because it does not have 


moments. 
14 See, e.g. [6] for a survey. 
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are described by imprecise probabilities emerged (see [2], [8, Sect. 7.5.2] and the 
references in the corresponding review sections therein.) Further insights are pro- 
vided from interpreting frequentist statistics as a decision problem under an imprecise 
sampling distribution, leading to the framework investigated in [34]. 

Other frequentist approaches, starting from different angles, include Hampel’s 
frequentist betting approach (e.g. [36]) and some work on minimum distance esti- 
mation under imprecise sampling models (e.g. [35]). The statistical consequences of 
the chaotic models in the genuine frequentist framework to imprecise probabilities, 
developed by Fine and followers (e.g. [33]), might be quite intriguing, but are still 
almost entirely unexplored. 


5.4.2 Model Imprecision and Generalized Bayesian Inference 


Priors and Sets of Priors, Generalized Bayes Rule. The centrepiece of Bayesian 
inference is the prior distribution. Apart from very large sample sizes, where the 
posterior is de facto determined by the sample, the prior naturally has a strong 
influence on the posterior and on all conclusions drawn from it. In the rare situations 
where very strong prior knowledge is available, it can be used actively, but most 
often the strong dependence on the prior has been intensively debated and criticized. 

Working with sets II of prior probabilities (or interval-valued priors) opens new 
avenues here. This set can naturally be chosen to reflect the quality/determinacy of 
prior knowledge: strong prior knowledge leads to “small” sets; weak prior knowl- 
edge to “large” sets. Typical model classes include neighbourhood models or sets of 
parametric distributions which often are conjugate '> to the sampling distribution, 
which typically still is assumed to be precise.'° In imprecise probability, TI is under- 
stood as naturally inducing the set I, of all posteriors arising from a prior in I.!” 
Interpretations of IT and I, vary to the extent they are understood as principled enti- 
ties. A pragmatic point of view sees an investigation of I, just as a self-evident way 
to perform a sensitivity analysis. On the other extreme, Walley’s [66] generalized 
theory of coherence, having initiated the most vivid branch of research on imprecise 
probabilities, provides a rigorous justification of exactly this way to proceed as the 
“Generalized Bayes Rule (GBR)”. Important developments have also been achieved 
for a variety of different model classes under the term “Robust Bayesian Analysis”; 
see, e.g. [58] for a review on this topic. 

Near Ignorance Models. One important way to use sets of priors is that they allow 
for quite a natural formulation of (rather) complete ignorance. A traditional Bayesian 


15 See, e.g. [41, Chap. 1.2] (in this volume). 
16 Imprecise sampling models are quite rarely studied, see, however, [66, Sect. 8.5] and [61]. 


17 There are some approaches in the traditional Bayesian context that work with sets of priors, too, 
but merely understanding such a set as an interim step, based on which an ultimate precise prior is 
selected. This can be done in a data-driven way, as in empirical Bayes approaches, including the 
ML-II approach (e.g. [11, Sect. 3.5.4]), or generally, as in maximum entropy approaches (e.g. [11, 
Sect. 3.4]). 
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model eo ipso fails in expressing ignorance/non-informativeness. Assigning a pre- 
cise probability is never non-committal; every precise prior delivers probabilistic 
information about the parameter. The genuinely non-informative model is the set 
of all probability distributions. While this model would yield vacuous inferences, 
it motivates so-called near-ignorance models, where, informally spoken, the inner 
core of this set is used, excluding extreme probabilities that are immune to learning. 
Near-ignorance models still assign non-committal probabilities to standard events 
in the parameter space, but allow for learning. By far the most popular model is the 
Imprecise Dirichlet Model (IDM) [67] for categorical data. Different extensions fol- 
lowed, including general near-ignorance models for exponential families (e.g. [10]). 
Another direction of enabling the formulation of near-ignorance uses all priors with 
bounded derivatives ([68]; for a general exposition of the concept of bounded influ- 
ence, see the book [69]). 

Prior-Data Conflict. In some sense, a complementary application of generalized 
Bayesian inference is the active modelling of prior-data conflict. In practice, general- 
ized Bayesian models are quite powerful in expressing substantial prior knowledge. 
In particular, in areas where data are scarce, it is important to use explicitly all prior 
knowledge available, for instance by borrowing strength from similar experiments. 
Then, however, it is crucial to have some kind of alert system warning the analyst 
when the prior knowledge appears doubtful in the light of data. Indeed, sets of pri- 
ors can be designed to react naturally to potential prior-data conflict: If data and 
prior assumptions are in agreement, the set of posterior distributions contracts more 
and more with increasing sample size. In contrast, in the case of prior-data conflict 
and intermediate sample size, the set of posterior distributions is inflated substan- 
tially, perfectly indicating that one should refrain from most decisions before having 
gathered further information. !* 


5.4.3 Some Other Approaches 


With generalized Bayesian approaches, and less pronounced with generalized fre- 
quentist statistics, the major statistical inference school are also predominant in the 
area of imprecise probabilities. Nevertheless, there has also been considerable suc- 
cess in other inference frameworks. Again and again, the desire to save Fisher’s 
fiducial argument, aiming at providing probability statements on parameters without 
having to rely on a prior distribution, has been a driving force for developments 
in imprecise probabilities. Dempster’s concept of multivalued mappings (e.g. [28]), 
which become even more famous in artificial intelligence by Shafer’s reinterpretation 
founding Dempster-Shafer Theory (see, for instance, again the survey by [29]), is to 
be mentioned here, but also work by Hampel (e.g. [36]) and by Weichselberger (e.g. 


'8 See [71] for a general treatment in one-parameter exponential families, also providing some 
illustrating plots, and, for instance, [70] for an application in the context of system reliability. 
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[74]), see also [7] tracing back its roots. 19 A generalized likelihood-based framework 
has been introduced by Cattaneo (e.g. [16, 17]). 

Another direct inference approach is Nonparametric Predictive Inference (NPI), 
as introduced by Coolen originally for Bernoulli data [18]. Based on exchangeability 
arguments, NPI yields direct conditional probabilities of further real-valued random 
quantities, relying on the low structure assumption that all elements of the natural 
partition produced by the already observed data are equally likely; see, for instance, 
[19] for a detailed discussion, [3] for a clear embedding into imprecise probabilities 
and [21] for a web-page documenting research within this framework. The basic 
approach can be naturally extended to censored data/competing risks (e.g. [22]), 
and to categorical data [20]. NPI has been developed further in a huge variety of 
fields; see, for instance, [23, 24] for recent applications in biometrics and finance, 
respectively. 


5.5 Statistical Modelling Under Data Imprecision 


In this section, we turn to statistical modelling under data imprecision. Keeping the 
distinction from Sect. 5.3, we briefly discuss ontic data imprecision and then turn to 
epistemic data imprecision. 

Ontic data imprecision, where we understand the imprecise observation as an 
entity of its own, may be argued to be a border case between classical statistics 
and its extensions. Technically, we change the sample space of each observation to 
(an appropriate subset of) the power set. For instance, recalling the election example 
from Sect. 5.3, this mean that instead of {a, b, c, .. .} representing the vote for a single 
party, we now also allow for combinations {a, b}, {b, c},..., {a, b, C}, . . ., represent- 
ing the indecision between several parties. As long as we are in a multinomial setting, 
nothing has changed from an abstract point of view, providing powerful opportunities 
for complex statistical modelling. In the spirit of this idea, [44, 55] apply multinomial 
regression models, classification trees, regularized discrete choice models from elec- 
tion research, and spectral clustering methods to German pre-election survey data. 
The situation changes substantially when ordinal or continuous data are considered, 
because, after changing to the power set, the underlying ordering structure is only 
partially preserved. 

Epistemic data imprecision is, as the examples at the end of Sect.5.3 show, of 
great importance in many applications and is quite vividly addressed in classical 
statistics. Even here, traditional statistics keeps its focus on full identification, i.e. 
the selection of one single probability model fitting the observed data optimally. One 
searches for, and then implicitly relies on, conditions under which one gets hands 
on the so-to-say deficiency process as a thought pattern, making ideal precise obser- 
vations imprecise. For that purpose, most classical approaches assume either some 


'9Tt may also be argued that, although less explicitly aiming at fiducial reasoning, the work on 
near-ignorance models discussed above fits well into this category. 
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kind of uninformativeness of the deficiency process (independent censorship, coars- 
ening at random (CAR) or missingness (completely) at random (MCAR, MAR)) or 
an explicit modelling of the deficiency process; see the classical work by [37, 46]. 
Both the uninformativeness as well as the existence of a precisely specifiable defi- 
ciency process are very strong assumptions. They are—eo ipso by making explicit 
statements about unobservable processes—typically not empirically testable. When- 
ever these assumptions are just made for purely formal reasons, the price to pay for 
the seemingly precise result of the estimation process is high. In terms of Manski’s 
Law of Decreasing Credibility, results may suffer severely from a loss of their 
credibility, and thus of their practical relevance. 

Against this background, in almost any area of application, the desire for less 
committal, cautious handling of epistemic data imprecision arose. Mostly isolated 
approaches were proposed that explicitly try to take all possible worlds into account in 
a reliable way, aiming at the set of all models optimally compatible with potentially 
true data. These approaches include, for instance, work from reliable computing 
and interval analysis in engineering, like [51], extensions of generalized Bayesian 
inference (e.g. [75]) to reliable statistics in social sciences (e.g. [56]); see also [8, 
Sect. 7.8.2], who try to characterize and unify these approaches by the concept of 
cautious data completion, and the concept of collection regions in [60]. 

There is a smooth transition to approaches that explicitly introduce cautious mod- 
elling into the construction of estimation procedures; see, for instance, for recent 
different likelihood- and loss minimization-based approaches addressing epistemic 
data imprecision, [25, 40, 43, 54]. Such approaches have the important advantage that 
their construction often also allows the incorporation of additional well-supported 
subject matter knowledge, too imprecise to be useful for the precision focused meth- 
ods from traditional statistics, but very valuable to reduce the set of compatible 
models by a considerable extent. 

Congenial is work in the field of partial identification and systematic sensitivity 
analysis, providing methodology for handling observationally equivalent models; 
see [48, 65], respectively, for classical work and [47, 62] for introductory surveys. 
The framework of partial identification is currently receiving considerable attention 
in econometrics, where in particular the embedding of fundamental questions into 
the framework of random sets is of particular importance [50]. 


5.6 Concluding Remarks 


The contribution provided a—necessarily painfully selective—survey of some devel- 
opments of statistical modelling with imprecise probabilities (in a wider sense, also 
including closely related concepts). Both in the area of model imprecision as well as 
under data imprecision, imprecise probabilities prove to be powerful and particularly 


20 “The credibility of inferences decreases with the strength of the assumptions maintained.” [48, 
p. 1]. 
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promising. Further developments urgently needed include a proper methodology for 
simulations with imprecise probabilities (see [64] for recent results), a careful study 
of the statistical consequences of the rather far developed probabilistic side of the 
theory of stochastic processes with imprecise probabilities (e.g. [63]), a more fruit- 
ful exchange with recent research on uncertainty quantification in engineering (see, 
e.g. [59] (in this volume)), an open mind towards recent developments in machine 
learning and more large scale applications. Not only for these topics it is important to 
complement the still recognizable focus on so-to-say defensive modelling by a more 
active modelling. Far beyond sensitivity and robustness aspects, imprecision can 
actively be used as a strong modelling tool. The proper handling of prior-data con- 
flict and the successful incorporation of substantive matter knowledge in statistical 
analysis under data imprecision are powerful examples of going in this direction. 
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Chapter 6 A) 
Reliability rie 


Lisa Jackson and Frank P. A. Coolen 


Abstract This chapter introduces key concepts for quantification of system relia- 
bility. In addition, basics of statistical inference for reliability data are explained, in 
particular, the derivation of the likelihood function. 


6.1 Introduction 


Reliability of systems is of crucial importance in all aspects of human life. Systems 
are understood to be groupings of components in specific structure, with the system 
functioning depending on the functioning of the components and the system struc- 
ture. Uncertainty about the functioning of components leads to uncertainty about 
the system reliability. To study how system reliability depends on the reliability 
of components, several leading methods from the engineering literature are briefly 
introduced in Sect. 6.2. In situations of uncertainty about reliability in engineering, 
appropriate statistical methods are required to deal with the specific nature of data. 
This chapter provides an overview of such basic methods. Section 6.3 introduces the 
key statistical concepts, basic statistical models are presented in Sect. 6.4. Through- 
out the emphasis is on explanation of the likelihood function, which is at the heart 
of most statistical inference approaches as commonly used in reliability applica- 
tions. Unknown model parameters can conveniently be estimated by maximisation 
of the likelihood function, with corresponding theory to assess the uncertainty of 
the estimates. Bayesian methods for statistical inference are based on the likelihood 
function as well, hence understanding of the likelihood function is crucial for study 
of system reliability under uncertainty. Stochastic process models are also crucial to 


L. Jackson (EX) 

Aeronautical and Automotive Engineering, Loughborough University, Loughborough, 
United Kingdom 

e-mail: |.m.jackson@ lboro.ac.uk 


F. P. A. Coolen (Bx) 
Department of Mathematical Sciences, Durham University, Durham, United Kingdom 
e-mail: frank.coolen @durham.ac.uk 


© The Author(s) 2022 81 
L. Aslett et al. (eds.), Uncertainty in Engineering, 

SpringerBriefs in Statistics, 

https://doi.org/10.1007/978-3-030-83640-5_6 


82 L. Jackson and F. P. A. Coolen 


describe variable reliability over time, for example, to reflect the effects of mainte- 
nance on a system’s reliability. A short introduction to such models is presented in 
Sect. 6.5, again with an emphasis on deriving the likelihood function to enable sta- 
tistical inference. As this chapter brings together generic introductory material, no 
specific references are included throughout the text. Instead, the chapter is ended with 
a brief list of useful resources, pointing to a number of important books which are 
highly recommended for further reading, and some brief comments about journals 
in the field. 


6.2 System Reliability Methods 


When modelling systems there are a number of tools, ranging from combinatorial 
methods including reliability block diagrams, fault tree analysis and event tree anal- 
ysis, to more complex methods that cater for a greater range of system characteristics 
including dependencies, e.g. Markov or Monte Carlo simulation methods. Note when 
modelling systems where repair times are involved, these typically do not follow the 
Exponential distribution, e.g. the Lognormal or Weibull distributions may be more 
suitable. Also including maintenance teams can mean dependencies are introduced 
via prioritising strategies. In such instances, more complex methods are required. A 
brief introduction to some of these modelling methods is provided in this section. 


6.2.1 Fault Tree Analysis 


One of the most common quantification techniques is fault tree analysis. This method 
provides a diagrammatic description of the various causes of a specified system 
failure in terms of the failures of its components. The choice of the system failure 
mode often follows from a failure mode and effects analysis (FMEA). There is 
the assumption of independence of failures of the components. Logical gates are 
used to link together events (intermediate events shown as rectangles and basic 
events representing failure events as circles), where the more common gates include 
AND or OR, shown in Fig. 6.1, with an example tree shown in Fig. 6.2. Evaluation 
of the tree using Boolean algebra yields minimal cut sets, denoted as C;, which 
are failure combinations of components that are necessary and sufficient to cause 
failure of the system. Application of kinetic tree theory and the inclusion-exclusion 
principle (Eq. 6.1) enables the system unavailability (Q sys) performance measure to 
be calculated, where P(C;) is the probability of failure of minimal cut set C;. 
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Symbol Name Causal Relation 


OR Output event occurs if at least one of 
the input events occur. 


s AND Output event occurs if all input events 
T occur. 
1 Vote Output event occurs if at least m of the 


TTT input events occur. 


Fig. 6.1 Common fault tree gate types 


Tank rupture due to overpressure 


safe Tenures Tal 


Timer contacts fail Operator fals to 
closed open switch 


Fig. 6.2 Example fault tree structure 
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Ne N. i-1 N: i-1 j-1 

Ons = > PCC) — >>>) PNC Y YO PNGA... 
i=l i=2 j=1 i=3 j=2 k=1 
1 PC Cas N Cy.). (6.1) 


If the unavailability of the system does not meet performance acceptability criteria, 
then the system must be redesigned. An indication of where to make changes in the 
system can be achieved by generating component importance measures. This is a 
measure of the contribution that each component makes to the system failure. One 
such measure is the Fussel—Vesely measure of importance (/ry,), defined as the 
probability of the union of the min cut sets containing each component given that 
the system has failed, as shown in Eq. 6.2. 


P (U{Cjli € C;}) 
Osys 


Try, = (6.2) 


6.2.2 Fault Tree Extensions: Common Cause Failures 


Typically, fault trees have only a limited capability to cater for dependencies within 
systems. One example is that of common cause failures. Safety systems, for example, 
often feature redundancy, incorporated such that they provide a high likelihood of 
protection. However, redundant sub-systems or components may not always fail 
independently. A single common cause can affect all redundant channels at the same 
time, examples include ageing (all made at the same time from the same materials), 
system environment (e.g. pressure or stress related) and personnel (e.g. maintenance 
incorrectly carried out by the same person). There are several methods to analyse 
such occurrences, including beta factor, limiting factor, boundary method and alpha 
method. The beta factor method assumes that the common cause effects can be 
represented as a proportion of the failure probability of a single channel of the 
multi-redundant channel. Hence, it assumes that the total failure probability (Qr) of 
each component is divided into two contributions: (i) the probability of independent 
failure, Q, and, (ii) the probability of common cause failures, Occ. The parameter 
B is defined as the ratio of the common cause failure probability to the total failure 
probability, as shown in Eq. 6.3. 


Qccr _ Qccr 


p= Occr + Qi Or 


There are further extensions to traditional fault tree analysis to cater for a greater 
range of dependencies, e.g. dynamic fault trees. As the nature of the dependency 
becomes more complex, other modelling methods will be more suitable. Other types 
of dependency include standby redundancy, where the probability of failure of the 


(6.3) 
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redundant component may change when it starts to function and experience load, 
hence it is dependent on the operating component and hence failure of both is not 
statistically independent. Another form is multiple state component failure modes, 
where a component can exist in more than one state, hence being mutually exclu- 
sive cannot be considered in the fault tree (not failing in the open mode does not 
mean that it works successfully). In such instances, using Markov modelling meth- 
ods may be desirable. Given the state-space explosion that can exist when using 
Markov approaches, if there are just small subsections of the system exhibiting the 
dependency it may be possible to analyse these subsections with Markov methods 
and use the result embedded in the fault tree approach. 


6.2.3 Phased Mission Analysis 


As systems become more complex, they can be required to undertake a number 
of tasks, typically in sequence. An example might include an aircraft flight, where 
it is required to taxi from the stand, take off, climb to the required altitude, cruise, 
descend, land and taxi to new destination stand. The collection of tasks can be referred 
to as a mission, where each task is denoted by a phase which has an associated time 
period. Mission success requires successful completion of all phases and in addition 
there may be different consequences resulting from the failure in each phase. For each 
phase, an appropriate modelling technique can be employed to assess its reliability or 
availability. When the systems phases are non-repairable then fault tree analysis can 
be used to assess mission and phase success. For non-repairable scenarios, Markov 
or Petri nets can be used. The parameter of interest is the mission reliability. It is not 
appropriate to analyse the reliability of each phase and multiply these together to get 
the mission reliability because (i) the phases are not independent (i.e. failure in one 
phase may influence failure in another phase); (ii) the assumption that all components 
are working at the start of the phase is not correct and (iii) the system can fail on a 
phase change. For the non-repairable case, the initial step is to construct a mission 
fault tree. The general form is as shown in Fig.6.3. The top event is a mutually 
exclusive OR gate, indicating that only one of the input events must happen to cause 
the output event (mission failure). When considering failures in phase 2 onwards, 
you need to include in the fault tree that the mission has been successful up to this 
point, namely, that it has functioned in the preceding phases. For this reason, you can 
see the introduction of the NOT gate, i.e. under the intermediate event “Functions in 
Phase 1’. Alongside this the failure of the component in earlier phases also needs to be 
taken into account, hence the component failure is represented as shown in Fig. 6.4. To 
perform the analysis, both qualitatively and quantitatively, new algebra is required as 
shown in Fig. 6.5. C; corresponds to the failure of component C in phase j, where the 
bar above C; corresponds to the working state. Considering the success of previous 
phases i = 1,..., j — 1, for failure in phase j, makes the analysis non-coherent, 
yielding prime implicant sets from a qualitative analysis (necessary and sufficient 
combinations of events (success and failure)). The size of the problem to be solved can 
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Mission 
Failure 
Fails in Fails in Fails in Fails in 
Phase 1 Phase 2 Phase j Phase m 
Functions in Conditions met Functions in Conditions met 
Phase 1 for failure in Phases 1 -> j-1 for failure in 


Phase 2 Phase j 
Functions in Functions in 
Phase 1 Phase1 | Phase j-1 
\Z NZ 
ZN ZN 


Fails in Fails in 
Phase 1 Phase j-1 
Fig. 6.3 Mission failure fault tree 


Fig. 6.4 Revised component 
representation 


Component C 
failed in Phase j 


(and usually does) become prohibitively expensive, so approximations are required; 
these are usually based on coherent approximations of the non-coherent phases (i.e. 
conversion of prime implicants to minimal cut sets). Approximate quantification 
formulae (e.g. Rare Event, Minimal Cut Set Upper Bound) can then be used. 

When analysing repairable systems, there are two requirements for mission suc- 
cess: (1) the system must satisfy the success requirements throughout each phase 
period and (2) at the phase change times the system must occupy a state which is 
successful for both phases involved. The second point implies that we will con- 
sider failures on phase transition when calculating mission reliability. Analysis with 
repairable systems is very similar in terms of generating the mission model and phase 
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EH CMe OPO oP Ta 
C,.C, =0 
CC, =0 if i<j 
Fig. 6.5 Additional algebra 
to ti-to t2-tı Phase 3 t3-t2 Mission 
Pre-mission N Phase 1 od Phase 2 end 
} “ll > } "| > } | > } | > ) 


Phase 1 Phase 2 Phase 3 


Petri Net ~~ X Petri Ne! - Petri Net 
-> J ) ~~ pen -=> 


— 


Fig. 6.6 Petri Net representation of a phased mission with three phases 


models. This can be achieved using Markov or Petri Net methods. An illustration 
of a Petri Net example is shown in Fig. 6.6, where the circles represent places and 
the rectangular boxes represent transitions. Mission and phase reliabilities can be 
obtained from analysis of the model. 


6.3 Basic Statistical Concepts and Methods for Reliability 
Data 


Consider a random quantity T > 0, often referred to as a ‘failure time’ in relia- 
bility theory, but it can denote any ‘time to event’. Relevant notation for charac- 
teristics of its probability distribution includes the cumulative distribution function 
(CDF) F(t) = P(T < t), the survival function S(t) = P(T > t) = 1 — F(t) (also 
called the reliability function and denoted by R(t)), the probability density function 
(PDF) f(t) = F’(t) = —S'(t) and the hazard rate h(t) = f(t)/S(t). The hazard 
rate has a possible interpretation with conditioning on surviving time t, for small 
ôt > 0, h(t)ét ~ P(T <t+6t|T > t). Harder to interpret, but also of use, is the 
cumulative hazard function (CHF) H(t) = J h(x)dx. Assuming R(0) = 1, we get 
H(t)= fy Lo dx = —In S(t), so S(t) = exp{—H(t)} = exp{— fj A(x)dx}. 
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A constant hazard rate, h(t) = A > 0 for all t > 0, gives S(t) = e~”", the Expo- 
nential distribution. This can be interpreted as modelling ‘no ageing’, that is, if an 
item functions, then its remaining time until failure is independent of its age. This 
property is unique to the Exponential distribution. An increasing hazard rate mod- 
els “‘wear-out’, roughly speaking this implies that an older unit has shorter expected 
residual life, and decreasing hazard rate models ‘wear-in’, implying that an older 
unit has greater expected residual life. It is often suggested that ‘wear-out’ is appro- 
priate to model time to failure of many mechanical units, whereas electronic units’ 
times to failures may be modelled by ‘wear-in’. In a human-life analogy, we can 
perhaps think about ‘wear-in’ as modelling time to death at very young age (‘infant 
mortality’) and ‘wear-out’ as modelling time to death at older age, with a period 
in between where death is mostly ‘really random’, e.g. caused by accidents. This 
‘human-life analogy’ should only be used for general insight, and is included here 
as engineers often claim that ‘typical hazard rates’ for components over their entire 
possible lifetime are decreasing early on, then remain about constant for a reasonable 
period, and then become increasing (“bath-tub shaped’). 

A popular parametric probability distribution for T is defined by the hazard rate 
h(t) = aB(at)*", for a, B > 0. This leads to S(t) = exp {—(at)* }, and is called a 
Weibull distribution with scale parameter «œ and shape parameter £. This distribution 
is often used in reliability, due to the simple form for the hazard rate. For example, 
with 6 = 2 it models ‘linear wear-out’ (‘twice as old, twice as bad’). 

An interesting aspect of reliability data is that these are often affected by censoring, 
in particular, so-called right-censoring. This means that, instead of actually observing 
a time at which a failure occurs, the information in the data is a survival of a certain 
period of time without failing. Clearly, such information must be taken into account, 
as neglecting it would lead to underestimation of expected failure times. 

Two main statistical methodologies use the likelihood function, namely, Bayesian 
methods and maximum likelihood estimation. Hence, derivation of the likelihood 
function is an important topic in reliability inference. Let tı, ..., tf, be observed fail- 
ure times, and c1, ..., Cm right-censored observations. For inference on a parameter 
0 of an assumed parametric model, the likelihood function based on these data is 
L(O|t),..-5tn3 C1, +-+5 Cm) = Ii- f(t;10) [ [j4 S(cil@). This actually requires the 
assumption that the censoring mechanism is independent of the data distribution, if 
that is not the case the dependence would need to be modelled. It is also possible to 
consider the likelihood over all possible probability distributions, so not restricting 
to a chosen parametric model. In this case, the maximum likelihood estimator is the 
so-called Product-Limit (PL) estimator, presented by Kaplan and Meier in 1958. The 
theory of counting processes also provides a powerful framework for nonparametric 
analysis of failure time data, based on stochastic processes and martingale theory. 
A well-known result within this theory is the Nelson—Aalen estimator for the CHF, 
which can be regarded as an alternative to the PL estimator. 
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6.4 Statistical Models for Reliability Data 


This is a very wide topic, we can only mention a few important models. We first con- 
sider regression models for reliability data. Regression models are generally popular 
in statistics, and also very useful in reliability applications, where often Weibull 
models are used, with the survival function depending on a vector of covariates x, 


Nx 


i ) |. Some simple forms are often used for the 


and given by S(t; x) = exp [- (+ 
shape and scale parameters as functions of x, e.g. the loglinear model for œ, spec- 
ified via Ina, = xT, with £ a vector of parameters, and similar models for nx. 
The statistical methodology is then pretty similar to general regression methods, and 
implemented in statistical software packages. Such models need to be fully specified, 
so are less flexible than nonparametric methods, but they allow information in the 
form of covariates to be taken into account. 

Semi-parametric models enable covariates to be taken into account, but do so 
without fully specifying a parametric model, keeping some more flexibility. Usually, 
a parametric form for the effect of the covariates on a nonparametric “baseline model’ 
is assumed. Most famous are the Proportional Hazards (PH) models, presented by 
Cox in 1972. Here, the hazard rate for covariates x is defined by h(t; x) = ho(t) vx, 
with ho(t) the baseline hazard rate (normally left unspecified, so nonparametric), 
and yw, some positive function of x, independent of time ¢ (normally a fully para- 
metric form is assumed for y,). The name of such models results from the fact that 


en = = , independent of t, so the hazard rates corresponding to different covari- 
; x3 
ates are in constant proportion. For these models, In S(t; x) = — ifs h(u; x)du = 


=Y f ho(u)du = wy In So(t), so S(t; x) = [So(t)]”. These models are used most 
for survival data in medical applications, but are also common and useful in reli- 
ability. As there are no assumptions on the form of the baseline hazard rate, they 
provide a valuable method to compare the effect of the covariates. An often used PH 
model is the linear PH model, with Yy = exp{x’ 6}, with £ a vector of parameters. 
We now describe the analysis of this particular model, with no further assumptions 
on ho(t). The goal is to estimate 6 and Ro(t), the baseline survival function related 
to Ao(t). This is far from trivial, as it is not clear how the likelihood function can be 
derived, since this is neither uniquely defined by a fully specified parametric model, 
nor completely free as was the case for fully nonparametric models (leading to the 
PL estimate). Hence, we need to use a different concept. 

Suppose we have data on n items, consisting of r distinct event times, t) < 
ta) <... < tr) (the case of ties among the event times is a bit more complicated), 
and n — r censoring times. Let R; be the risk set at ta), so all items known to be 
still functioning just prior to ta). We can now estimate 6 via maximisation of the 
‘likelihood function’: 


r exp [eng] 
a Lier, exp {x P} 


i=1 


(6.4) 
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with xg) the vector of covariates associated to the item observed to fail at ta), etc. 
There have been many justifications for L(£), the nicest is the original one by Cox, 
which is as follows. Let us consider R; at tọ). The conditional probability that the 
item corresponding to xç) is the one to fail at the time ta), given that there is a failure 
at ta), is equal to 


hlt; xa) «SP [kog] 
Mier, Ato; x1) Vier, €XP {x7 B} 


Now L(£) is formed by taking the product of all these terms over all failure times, 
giving a ‘likelihood’ which is conditional on the event times, sometimes called the 
‘conditional likelihood’ (aka ‘partial likelihood’ or ‘marginal likelihood’). Note that 
the actual event times tœ) are not used in Eq.6.4, just the ordering related to the 
values of the covariates. This relates to the fact that we do not have any knowledge 
or assumptions about ho(t). Large sample theory is available for L(8), allowing 
estimation and hypothesis testing similarly as for standard maximum likelihood 
methods. 

Next, one must consider estimation of the survival function. Once the estimate for 
B has been derived, let us denote this by B, it is possible to obtain a nonparametric 
estimate of the baseline survival function. Let 


So(t) = gi Qj, 


Jt st 
where the & j 8 are derived via 


on, = 1 _ Àj i 
Jier, Ài 


and 
Àj = exp {x7 B} , 


and taking 6 = B. This actually gives the maximum likelihood estimate for the 
survival function, under the assumption that £ is indeed the given estimate £. 


6.5 Stochastic Processes in Reliability—Models 
and Inference 


Suppose we have a system which fails at certain times, where there may be some 
actions during this period which affect failure behaviour, e.g. minimal repairs to 
allow the system to continue its function, or replacement of some components, or 
other improvements of the system. Let the random quantities T < Dh< T3 <... 
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be the times of failure of the system, and let X; = T; — T;—ı (with Ty = 0) be the 
time between failures i — 1 and i. These X;, or in particular trends in these, are 
often of main interest in analysis of system failure, e.g. to discover whether or not a 
system is getting more or less prone to fail over time. Hence, the major concern is 
often detection and estimation of trends in the X;. Therefore, we cannot just assume 
these X;’s to be (conditionally) independent and identically distributed (iid), as is 
often assumed for standard statistical inference on such random quantities. Instead, 
we need to consider the process in more detail. A suitable characteristic for such a 
process is the so-called ‘rate of occurrence of failure’ (ROCOF). Let N (t) be the 
number of failures in the period (0, t], then the ROCOF is 


d 
v(t) = Apr: 


An increasing (decreasing) ROCOF models a system that gets worse (better) over 
time. Of course, all sorts of combinations can also be modelled, e.g. first a period 
of decreasing ROCOF, followed by increasing ROCOF, to model early failures after 
which a system improves, followed by a period in which the system wears out. 
Note that the ROCOF is not the same as the hazard rate (the definitions are clearly 
different!), although intuitively they might be similar. If we consider a standard 
Poisson process, with iid times between failures being Exponentially distributed, 
then the ROCOF and hazard rate happen to be identical. 

An estimator for v(t) is derived by defining a partition of the time period of inter- 
est, counting the number of failures in each of the intervals of this partition, and 
dividing this number by the length of the corresponding interval if necessary (i.e. if 
not all intervals are of equal length). However, it is more appealing to use likelihood 
theory for statistical inference, which we explain next for nonhomogeneous Pois- 
son processes (NHPP), for which the ROCOF is a central characteristic often used 
explicitly to define such processes. 

NHPP are relatively simple models that can be used to model many reliability 
scenarios, and for which likelihood-based statistical methodology is well developed 
and easy to apply. The crucial assumption in these models is that the numbers of 
failures in distinct time intervals are independent if the process characteristics are 
known. A NHPP with ROCOF v(t) is easiest defined by the property that the number 
of failures in interval (tı, t2] is a Poisson distributed random quantity, with mean 


m(ti, t2) = f ; v(t)dt. 


This implies that the probability of 0 failures in interval (tı, t2] equals exp{—m(t,, t2)}, 
and the probability of 1 failure in this interval equals m (tı, t2) exp{—m(h, t2)}. Of 
course, if v(t) is constant we have the standard Poisson process. For statistical infer- 
ence, we wish to find the likelihood function corresponding to a NHPP model, given 
failure data of a system. Suppose we have observed the system over time period 
[0, r], and have observed failures at times f < ty <... < tn <r, assuming there 
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were no tied observations (so only a single failure at each failure time; else things 
get slightly more complicated). The likelihood function is derived in a similar way 
as for iid data, that is, the reasoning of using PDFs at failure times in the likelihood 
for such data. Let ôt; > 0, fori = 1,...,, be very small. The process observed can 
then be described as consisting of: 0 failures in (0, tı), and 1 failure in [f,, t; + 61), 
and 0 failures in [t; + 64), t2), etc., until no failures in [t + d¢,, r] (this last bit is just 
deleted if r = t,, so when observation of the process is ended at the moment of the 
n-th failure). To derive the corresponding likelihood function, we take the product 
of the probabilities for these individual events, so 


exp{—m(0, t1)} x m(t, ti + 6t,) exp{—m(t, ti + 6t)} x exp{—m(t, + 64, t2)} x... 
... X exp{—m(th + btn, r)} 


n t;+6t; r 
= {1 p noar] x exp -f voar f 
ti 0 


i=1 


Now, use that for very small ôt;, we have that 


ti+ôti 
i v(t)dt © v(ti)ôti. 
t 


i 


Now we divide through by []}_, ôt;, and let all ôt; | O (this is, exactly the same that 
leads to the PDFs appearing in the likelihood function for iid data). This gives the 
likelihood function, for this model and based on these n failure data and observation 


over the period [0, r]: 
LS ii ro] exp |- f voar ; 
i=l 9 


For optimisation, it is easier to use the log-likelihood function, which is also needed 
for related statistical inference, and which is equal to: 


l= Xoin v(t) — f v(t)dt. 
i=1 


It is possible to work with this likelihood non-parametrically, but often one assumes 
a parametric form for the ROCOF, making maximum likelihood estimation again 
conceptually straightforward (although it normally requires numerical optimisation). 
Two simple, often used parametric ROCOFs are 


vi (t) = exp(Bo + Bit) 
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and 
v(t) = ynt}, 


with y, 7 > 0. 

Many models that have been suggested, during about the last three decades, for 
software reliability, are NHPPs which model the software testing process as a fault 
counting process. A famous model was proposed by Jelinski and Moranda in 1972, 
which is based on the following assumptions: (1) software contains an unknown 
number of bugs, N; (2) at each failure, one bug is detected and corrected; (3) the 
ROCOF is proportional to the number of bugs present. So, they use a NHPP with 
failure times T;,i = 1,..., N and Tọ = 0, defined by 


vt) =(N—i+1), fort €[T-1, Ti), 


for some constant A. Then N and å are both considered unknown, and estimated from 
data, where, of course, inference for N tends to be of most interest, or, in particular, 
the number of remaining bugs. Many authors have contributed to such theory by 
changing some model assumptions. For example, non-perfect repair of bugs has been 
considered, and even the possibility of such repair introducing new bugs (possibly a 
random number), for this last situation so-called ‘birth-death processes’ can be used. 
Also non-constant A has been considered, e.g. with the idea that some bugs may 
tend to show earlier than others. Also Bayesian methods for such models, and even 
software reliability models more naturally embedded in Bayesian theory, have been 
suggested and studied. However, although there is an enormous amount of literature 
in this area, as indeed mathematical opportunities appear to have no limit here, the 
practical relevance of such models seems to be rather limited and few interesting 
applications of such models have been reported in software reliability. Recently, the 
important topic of testing of reliability of systems including software has received 
increasing attention, which is much needed to ensure reliable systems. 
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Chapter 7 R) 
Simulation Methods for the Analysis get 
of Complex Systems 


Hindolo George-Williams, T. V. Santhosh, and Edoardo Patelli 


Abstract Everyday systems like communication, transportation, energy and indus- 
trial systems are an indispensable part of our daily lives. Several methods have been 
developed for their reliability assessment—while analytical methods are computa- 
tionally more efficient and often yield exact solutions, they are unable to account for 
the structural and functional complexities of these systems. These complexities often 
require the analyst to make unrealistic assumptions, sometimes at the expense of accu- 
racy. Simulation-based methods, on the other hand, can account for these realistic 
operational attributes but are computationally intensive and usually system-specific. 
This chapter introduces two novel simulation methods: load flow simulation and 
survival signature simulation which together address the limitations of the existing 
analytical and simulation methods for the reliability analysis of large systems. 


7.1 Introduction 


A system is classed as complex from one of two fronts—in terms of the functional 
relationships between its components and in terms of its structure. A structurally 
complex system does not conform to a series, parallel, or series-parallel configuration. 
Most real-world systems are composed of components that can operate at multiple 
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performance levels or states and components with a functional coupling with other 
components. Such systems are deemed functionally complex, since their states cannot 
be directly deduced from their traditional two-state structure functions. They are 
characterised by multiple states, with the number of states determined by the diversity 
in the states of their components, structure and the functional relationships between 
their components [21]. In these systems, the number of performance levels may 
or may not be finite, depending on the performance measure under consideration 
and the type of system [21]. For instance, the power generated by a power plant 
may take any value between zero and its maximum achievable value, depending 
on the performance levels of its component and the demand on the grid. Complex 
systems may be standalone or form an indispensable part of some critical system like 
healthcare, safety-critical and industrial control systems. It is, therefore, important 
to be able to assess their susceptibility to failures, as well as quantify and predict the 
ensuing consequences, for effective planning of restoration and mitigation measures. 


7.2 Reliability Modelling of Systems and Networks 


In system reliability evaluation, the analyst has numerous techniques at their disposal, 
which can be classified as heuristic-, analytical- or simulation-based [1] and further 
as static or dynamic. In particular, dynamic techniques not only model the system 
based on the functional and structural relationships between its components, but also 
support dynamic relationships like inter-component and inter-system dependencies. 


7.2.1 Traditional Approaches 


Reliability Block Diagrams and Fault Trees have been extensively used in the reli- 
ability evaluation of binary-state systems. Both techniques have proven particularly 
useful for moderately sized systems with series-parallel configurations. However, 
they become difficult to apply with large or complex systems and often require 
additional techniques to decompose the system. The Reliability Graph [40] was, 
therefore, developed to overcome this difficulty and proved very efficient in mod- 
elling structural complexities. Reliability block diagrams, fault trees and reliability 
graphs, however, assume components to be statistically independent, which renders 
them inadequate for systems susceptible to restrictive maintenance policies and inter- 
component dependencies. However, techniques including but not limited to dynamic 
reliability block diagrams [10], dynamic fault trees [6], condition-based fault trees 
[35], dynamic flow graphs [2], Petri Nets [26] and other combinatorial techniques 
[38] have been developed to model these dynamic relationships. They have found 
application in a wide range of reliability engineering problems, including repairable 
systems with restrictive maintenance policies. 
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Though the earliest forms of these techniques including binary decision diagrams 
were applicable only to binary-state systems, numerous instances of their recent 
extension to multi-state systems exist, see, e.g. [39]. However, these extensions either 
require state enumeration or the derivation of the minimal path or cut sets of the 
system, which is an NP-hard problem [41]. 

The extended block diagram technique and graph-based algorithms share 
two common limitations. First, they define reliability with respect to the maximum 
flow through the system. Therefore, they are limited to systems with single output 
nodes and those with multiple output nodes where only the presence of flow at these 
nodes is relevant and not the relative magnitude of the flow. The second limitation 
arises from the assumption that there are no flow losses in the system, making them 
inapplicable to certain practical systems like energy systems and pipe networks, 
susceptible to losses in some failure modes. More recently, various researchers have 
made invaluable contributions to multi-state system reliability analysis, developing 
techniques applicable to a wide range of systems [22]. These techniques have mainly 
been based on either the structure function approach, stochastic process, simulation 
or the Universal Generating Function approach [21, 25]. 

The most popular stochastic process employed in system reliability analysis is 
the Markov Chain (MC), which involves enumerating all the possible states of the 
system and evaluating the associated state probabilities [25]. This technique is only 
easily applicable to exponential transitions or distributions with simple cumulative 
distribution functions, requires complicated mathematics and becomes complex for 
large systems. For an M component binary-state system, the number of states in 
the model ranges from M + 1 for series systems, to 2” for parallel systems. For 
large multi-state systems, the number of states increases exponentially, rendering the 
model difficult to construct and expensive to compute. 

The Universal Generating Function was introduced to address the state explo- 
sion problem of the MC. It allows the algebraic derivation of a system’s perfor- 
mance from the performance distribution of its components [21, 24]. However, both 
the Universal Generating Function and Markov Chain are limited in the number 
of reliability indices they can quantify. Also, like all multi-state system reliability 
evaluation techniques, they are maximum-flow-based and assume flow conservation 
across components. The Universal Generating Function, though straightforward for 
series/parallel systems, it requires a substantial effort for complex topologies. 

Simulation methods are the most suitable for multi-state system reliability and 
performance evaluation, since they mimic the actual operation of systems. Their 
advantage over their analytical counterpart is due to the fact that they support any 
transition distribution, allow the effects of external factors on system performance to 
be investigated [43] and are easily integrated with other methods [36]. In particular, 
they allow the explicit consideration of the effects of uncertainty and imprecision 
on the system, providing a powerful tool for risk analysis and by extension, ratio- 
nal decision-making under uncertainty. They are, therefore, mostly used to analyse 
systems for which analytical approaches are inadequate. However, even some of the 
existing simulation methods [23, 43] require prior knowledge of the system’s path 
set, cut set or structure function and are mostly limited to binary-state systems [42]. 
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7.2.2 Interdependencies in Complex Systems 


Engineers and system designers are under immense pressure to build systems robust 
and adequate enough to meet the ever-increasing human demand and expectation. 
Unavoidably, the resulting systems are complex and highly interconnected, which 
ironically constitute a threat to their resilience and sustainability [18]. Two systems 
are interdependent if at least a pair of components (one from each system) are cou- 
pled by some phenomena, such that a malfunction of one affects the other. In such 
systems, an undesirable glitch in one system could cascade and cause disruptions 
in the coupled system. The cascade could be fed back into the initiating system 
and the overall consequences may be catastrophic [5]. To minimise the effects of 
failures, some interdependent systems are equipped with reconfiguration provisions. 
This normally entails transferring operation to another component, rerouting flow 
through alternative paths, or shutting down parts of the system. 

The achievement of maximum overall system performance is, in general, desir- 
able. However, in many applications (nuclear power plants, for instance), it is more 
important to guarantee system availability and recovery in the shortest possible time, 
following component failure [16]. Interdependencies are manifested in engineering 
systems at two levels: between components (inter-component), which can be func- 
tional or induced and between systems/subsystems (inter-system) [15]. 

Functional dependencies are due to the topological and/or functional relationships 
between components. Induced dependencies, on the other hand, are due to a state 
change in one component (the initiator) triggering a corresponding state change in 
another (the induced), such that even when the initiator is reinstated, the induced does 
not reinstate, unless manually made to do so. Functional dependencies in standalone 
systems are intrinsically accounted for by the innate attributes of the system reliability 
modelling and evaluation technique while induced dependencies require explicit 
modelling. Inter-system dependencies, on the other hand, are due to functional or 
induced couplings between multiple systems. The functional dependencies in these 
systems, however, may require explicit modelling. This is the case for components 
relying on material generated by another system. For instance, an electric pump in a 
water distribution system relies on the availability of the electricity network. 

Induced dependencies are further divided into Common-Cause Failures (CCF) 
[27] and cascading events, as summarised in Fig.7.1. Common-cause failures are 
the simultaneous failure of multiple similar components due to the same root cause. 
Their origin is traceable to a coupling that normally is external to the system. Notable 
instances are shared manufacturing lines, shared maintenance teams, shared envi- 
ronments and human error. A group of components susceptible to the same CCF 
event is called a Common-Cause Group (CCG). An important point to note about 
common-cause failures is that, on occurrence of the failure event, there is a probabil- 
ity associated with multiple component failure and that the affected components fail 
in the same mode. Consequently, the number of components involved in the event 
ranges from | to the total number of components in the CCG. CCF events may affect 
an entire system or only a few of its components and, therefore, pose a consider- 


7 Simulation Methods for the Analysis of Complex Systems 99 


Interdependencies 


Inter-component Inter-system 


Functional Induced 


Common-Cause Failures Cascading Failures 


Fig. 7.1 Types of interdependencies in complex systems. Functional dependencies—such as when 
the failure of power supply forces the unavailability of connected components. Common-Cause 
Failures—due to earthquake excitation, vibration, environmental conditions (temperature, humidity, 
contaminants), shared maintenance. Cascading events such as the failure of one component might 
overload other components 


able threat to the reliability of systems. CCF modelling and quantification attracts 
keen interest from system reliability and safety researchers, as well as practitioners. 
Examples of the work that has been done in this field can be found in [28, 33, 37]. 
Most of the methods presented in these publications, however, are built on reliability 
evaluation techniques that do not segregate the topological from the probabilistic 
attributes of the system. As such, they are computationally expensive for problems 
involving multiple reliability analysis of the same system. They also have yet to be 
applied to multi-state systems, as well as systems susceptible to both cascading and 
common-cause failures. 

Cascading failures are those with the capacity to trigger the instantaneous failure 
of one or more components of a system. They can originate from a component or from 
a phenomenon outside the system boundary. The likelihood of the initiating event 
originating from within the system distinguishes them from CCF. Another point of 
dichotomy is that the affected components do not necessarily have to be similar or 
fail in the same mode. In addition, at the occurrence of the initiating event, the prob- 
ability of all the coupled components failing is unity, same for the case when they are 
in a state rendering them immune [15, 18]. A few prominent examples of initiating 
events external to the system are extreme environmental events, natural disasters, 
external shocks, erroneous human-system interactions and terrorist acts. Various 
models have been developed to study the effects of cascading failures on complex 
systems [29]. However, a good number of these models only assess their response 
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to targeted attacks, variation in some coupling factor or the relative importance of 
system components. When faced with the additional situation of random component 
failures, a complete reliability and availability analysis should be performed [18]. 
Even methods that fulfill this requirement have their applicability hampered by com- 
ponents that undergo non-Markovian transitions, components susceptible to delayed 
transitions, and reconfigurable systems. 


7.3 Load Flow Simulation 


The load flow simulation is a recently proposed technique for the reliability and 
performance analysis of multi-state systems [17]. It is based on the fact that if the 
performance levels of a system’s components are known, the performance levels 
of the system can be directly derived from its network model. In this formalism, 
each component is modelled as a semi-Markov stochastic process and the system 
as a directed graph whose nodes are the components of the system. The approach 
is intuitive and applicable to any system architecture and easily programmable on 
a computer. It outperforms other multi-state system reliability analysis approaches, 
since it does not require state enumeration or cut set definition. Efficient algorithms 
for manipulating the adjacency matrix of this directed graph to obtain the flow equa- 
tions of the system are available in OpenCossan [31]. 

The operation of the system is simulated using Kinetic Monte Carlo method by 
initially sampling the state and time to the next transition (hereafter referred to as 
transition parameters) of each component. The simulation jumps to the smallest sam- 
pled transition time tmin, at which time the states of the components undergoing the 
transition are updated. Using the updated performance levels of the components of 
the system, the virtual flow across the system is computed via a linear programming 
procedure that employs the interior-point algorithm. The new transition parameters 
of the components undergoing a transition are then sampled and the simulation jumps 
to the next smallest transition time. This cycle of component transition parameter 
sampling, transition forcing and system performance computing continues until the 
mission time T is reached. The system performance computed at every component 
transition is captured and saved in counters, from which the performance indices of 
the system can be deduced. A component shutdown and restart procedure is incor- 
porated to replicate the actual operating principles of most practical systems. In this 
procedure, the availability of each system component is tested against its predefined 
reference minimum input load level at every transition and the effects of functional 
interdependence on the failure probability of the components are accounted for. 
Figure 7.2 provides a high-level illustration of the load flow simulation procedure. 

Ageing and component performance degradation is common in most systems. 
For such systems, techniques built around the flow conservation principle become 
obsolete, as the flow generated by sources can be dissipated in intermediate com- 
ponents in certain failure modes. For instance, consider a 100 MW power generator 
supplying a 95 MW load through a 125 MW transformer. If there are no power losses 
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Fig. 7.2 Flowchart of the load flow simulation 


in the transformer, 95 MW will be drawn from the generator and delivered to the load. 
However, if the efficiency of the transformer deteriorates to say 75%, it now takes 
all 100 MW from the generator but delivers only 75 MW to the load. In both cases, 
the apparent difference between the generation capacity and demand is the same but 
the power drawn from the generator increases while the effective power supplied 
to the load deteriorates. For this example, the demand would have to be slashed to 
75 MW or less, to preserve the operational integrity of the generator. Other scenarios 
where component inefficiency affects system reliability are: a power transmission 
line prone to losses and an oil pipeline where a failure mode is a hole in a pipe or 
gasket failure at some flange [17]. 

The load flow simulation approach has been successfully applied to the availabil- 
ity assessment of a reconfigurable offshore installation [18], dynamic maintenance 
strategy optimization of power systems [19] and the probabilistic risk assessment of 
station blackout accidents in nuclear power plants [16]. 


Advantages Over Existing Techniques: 


1. Inherits all the advantages of simulation approaches used for system reli- 
ability and performance evaluation. 

2. Implements any system structure with relative ease, since it doesn’t require 

knowledge of the minimal path or cut sets prior to system analysis. 

Calculates the actual flow across every node of the system. 

4. Models systems made up of multiple source and sink nodes with competing 

static or dynamic demand. 

Models losses in components and across links. 

Models component restart and shutdown. 

7. Not limited to integer-valued node capacities and system demand, as 
required by other graph-based algorithms. 


iS) 


NS A 


7.3.1 Simulation of Interdependent and Reconfigurable 
Systems 


Load flow simulation allows the modelling of inter-component and inter-system 
dependencies, thereby supporting the reliability assessment of realistic engineering 
systems [18]. Components and external events that influence the operation of the 
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Fig. 7.3 Illustration of decoupling procedure for interdependent systems 


system are identified and numbered, followed by the identification and modelling of 
all the inter-component dependencies in the system. The strategy is to decouple the 
interdependent system into its constituent systems (subsystems) as shown in [18]. 
The nodes associated with each subsystem are then identified and its directed graph 
obtained (i.e. only nodes with actual commodity flow are considered). The states of 
each node are then identified and modelled as described in [17]. 

For illustrative purposes, consider the original system in Fig. 7.3 (left panel). It is 
an interdependent four commodity system—each solid line transports a commodity 
and the broken line depicts an induced dependency in the direction of the arrow. 
Node 2 is part of subsystem Sz and relies the commodity from subsystem $3 to 
drive its operation. One would say it is functionally dependent on subsystem $3 and 
exhibits a dual operation mode, operating both as a sink and an intermediate node. 
Its sink mode directly influences flow in $3, while its transmission mode directly 
influences flow in S2. It is, therefore, logical to separate node 2 into its constituent 
nodes, each representing a mode of operation. Virtual nodes representing the sink 
modes of dual nodes are created and assigned new IDs, creating a decoupled system 
(see Fig. 7.3 (right panel)). A load-source functional dependency exists between the 
decoupled nodes, since the transmission node is incapacitated if flow into the sink 
node is inadequate. Therefore, they make a load-source pair, with the transmission 
node being the load and the sink node, the local source node. 

Local sources, otherwise known as support nodes in load-source pairs, are mod- 
elled as binary-state objects: state 1 (active) has capacity l, depicting the availability 
of the dependent node; State 2 (inactive) has capacity 0 and depicts its unavailabil- 
ity. l is the minimum level of support required to operate the dependent/sink node 
and in practical cases represents the load rating of that component. By applying the 
decoupling procedure described to all load dependency relationships in the system, 
the following load-source pairs; {2, 14}, {3, 16}, {1, 18}, {13, 15} and {9, 17} are 
obtained. L; = {/, /} signifies that node i requires a minimum of / units of a certain 
commodity from node j to operate. If i has a load dependency relationship with 
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multiple nodes, L; takes the form of a two-column matrix, with each row defining 
the node’s relationship with another node. 

Induced dependencies are defined by the parameter D; = {dj1, dj2, dj3, djahuxa | 
j =1,2,...,u — 1l, u, which defines the state change induced in other nodes as a 
result of a state change in node i. dj, is the state of i triggering the cascading event, 
dj2; the affected node, dj3; the state the node has to be in to be affected, and d 4; its 
target state on occurrence of the event. Each row of D; defines the behaviour of an 
affected node, and u, the number of relationships. If node i and the affected node 
dj belong to different subsystems, the subsystem the latter belongs to is dependent 
on the subsystem of the former. For example, suppose state 2 of node 5 in Fig. 7.3 
forces node 7 into state 3 if it is in state | at the time node 5 makes the transition to 
state 2. The induced dependency of node 7 on node 5 is defined by Ds as 


Ds = (2713) (7.1) 


Once the system has been decoupled, the dependency tree depicting the relation- 
ships between its subsystems and their ranking is derived. The rank of a subsystem 
depends on its position on the tree relative to the reference subsystem. The indepen- 
dent subsystem, which is also the reference subsystem, is assigned rank | and the 
remainder ranked in ascending order of their longest distance from this reference. 
See [18] for the details of the ranking, reconfiguration and simulation procedures. 


7.3.2 Maintenance Strategy Optimization 


The load flow simulation approach can be exploited to optimise the maintenance 
strategies of complex systems. The multi-state semi-Markov models of components 
are extended to represent their behaviour under various maintenance strategies. The 
operation of the system is then simulated using a slightly modified version of the 
simulation procedure depicted in Fig. 7.2 and detailed in [19]. Non-Markovian com- 
ponent transitions associated with the operational dynamics imposed by maintenance 
strategies are implemented. For example, the maintenance of a failed component can 
only be initiated if there is an idle maintenance team, making the transition of the 
component from its failed to working state non-Markovian, since it is conditional on 
the availability of a maintenance team. Additional component states such as preven- 
tive maintenance, corrective maintenance, shutdown, diagnostics, idle and awaiting 
maintenance are included to model different maintenance activities. 

To illustrate the derivation of the multi-state model of a component under various 
maintenance strategies, consider a binary-state component. The component is subject 
to both preventive and corrective maintenance and maintained by a limited number 
of maintenance teams. In addition, its corrective maintenance consists of two stages: 
a diagnosis stage and a restoration stage. Following diagnosis, the maintenance team 
could proceed with the actual repairs if spares are not required or make a spares 
request. There is a known probability associated with spares being needed for a repair 
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iit: > Forced Transition 


Kept out of operation ~~ > Conditional Transition Returned into operation 
during spares delay ——> Normal Transition during spares delay 


Fig. 7.4 Multi-state models of binary-state component under maintenance delays 


and while the maintenance team awaits the spares, it could be assigned to another 
component. Similarly, there is a probability associated with spares being needed to 
complete the preventive maintenance of the component, which could be interrupted 
if these spares are not immediately available. The resulting multi-state models of 
the component under two contrasting maintenance strategies are shown in Fig. 7.4, 
with the component’s state assignments and possible transitions. Transitions are 
either normal, forced or conditional. Normal transitions occur randomly and depend 
only on their associated time-to-occurrence distributions. Forced transitions occur 
purely as a consequence of events outside the component boundary, and their time- 
to-occurrence distributions are unknown. Conditional transitions, on the other hand, 
have a known time-to-occurrence distribution but are assigned a lower priority and 
only occur on fulfilment of a predefined probabilistic condition or set of conditions 
[19]. Unlike normal transitions in which the next state of the component depends 
only on its current state, the next state of the component under forced transitions 
may also depend on its previous state. As such, the multi-state component transition 
parameter sampling procedure presented in [17] cannot be used to determine the 
transition parameters of the component. For this, the set of procedures presented in 
[19] are required. The binary-state component models in Fig. 7.4 can be generalised 
for multi-state components by defining one ‘Idle’ state (if components are kept out 
of operation during spares delay), a ‘Diagnosis’ state (where necessary) and one 
‘Corrective Maintenance’ state for each repairable failure mode. 

With this approach, multiple contrasting complex maintenance strategies can be 
simulated without the need to modify the simulation algorithm, as the maintenance 
strategy is implemented at the component level. See, for instance, the optimal main- 
tenance strategies for a hydroelectric power plant derived in [19]. 
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7.3.3 Case Study: Station Blackout Risk Assessment 


The complete lack of AC power at a nuclear power plant is critical to its safety, since 
AC power is required for its decay heat removal. Though designed to cope with 
these incidents, nuclear power plants can only do so for a limited time. The impact of 
station blackouts on a nuclear power plant’s safety is determined by their frequency 
and duration. These quantities, however, are traditionally computed via a static fault 
tree analysis that deteriorates in applicability with increasing system complexity. 
The load flow simulation approach was used to quantify the probability and dura- 
tion of possible station blackouts at the Maanshan Nuclear Power Plant in Taiwan, 
accounting for interdependencies between system components, maintenance, system 
reconfiguration, operator response strategies and human errors [16]. 

The Maanshan Plant is powered through two physically independent safety buses, 
which themselves are powered by six offsite power sources through two independent 
switchyards. Each safety bus has a dedicated backup diesel generator and both buses 
share a third diesel generator. Two gas turbine generators connected through the 
second switchyard power the plant’s safety systems if all three diesel generators are 
unavailable. The gas turbine generators, however, take about 30 min to become fully 
operational, when powered on. The goal in this case study was to quantify the risk 
to the plant, of station blackouts initiated by the failure of the grid sources, as well 
as the switchyards and identify the best recovery strategy, to minimise this risk. 

The load flow simulation approach was used to model the structural/functional 
relationships between the components of the system as described in Sect.7.3 and 
the formalism described in Sect. 7.3.1 to model both the interdependencies between 
components and their dynamic behaviour under various recovery strategies. The full 
details of the solution approach and results are available in [16]. 


7.4 Survival Signature Simulation 


For very large-scale systems and networks, the full system structure information (or 
structure function, minimal paths sets, etc.) might not be available or may be difficult 
to obtain. Having a compact representation of the system, therefore, is advantageous. 

Survival signature [7] has been proposed as a generalisation of system signature 
[11, 12] to quantify the reliability of complex systems consisting of independent 
and identically distributed (iid) or exchangeable components, with respect to their 
random failure time. It has been shown in [8] how the survival signature can be 
derived from the signatures of two subsystems in both series and parallel config- 
uration. The authors developed a non-parametric-predictive inference for system 
reliability using the survival signature. Aslett et al. [3] demonstrated the applicabil- 
ity of the survival signature to system reliability quantification via a parametric, as 
well as non-parametric approach. An efficient computational approach for computing 
approximate and exact system and survival signatures has been recently presented 
in [20, 34]. Feng et al. [13] developed an analytical method to calculate the sur- 


106 H. George-Williams et al. 


Fig. 7.5 Example of a bridge network composed of six-component of two types 


Table 7.1 Survival signature for the system shown in Fig. 7.5 


(l, 12) 


W| N 


1/3 [0, 1, 2, 3] 1 


vival function of systems with uncertainty in the parameters of component failure 
time distributions. These methods are all useful but less practical for larger complex 
systems and not applicable to non-exponential transitions. 

As an illustration, consider a six-component bridge network with two component 
types (Fig. 7.5), the survival function is given by Table 7.1. 

Considering 2 working components of type 1; /; = 2 and 3 of type 2; h = 3, 
there are three possible combinations in total but only two combinations lead to 
success (the survival of the system) of the system. Hence, the survival signature of 
the system is 2 as shown in Table7.1. Similarly, for 7; = 3 and /, = [0, 1, 2, 3], 
there are eight possible combinations in total, all of which result in success. Hence, 
the survival signature of the system in this case is equal to 1.0. Thus, knowing the 
success paths from the combinations of multiple types of active components, it is 
possible to compute the survival function of a complex system. 

Exact analytical solutions are restricted to particular cases (e.g. systems with 
component failure times following the exponential distribution and non-repairable 
components). The survival function of a system with K component types is given by 


mı MK K 
ALS) 2) oh, OPKO = kH (1.2) 
h=0 Ik=0 k=1 
where 
K K m 
P( Hew =)=] | ( jnor-n — F(t)" (7.3) 
Ik 
k=1 k=1 
Here, C(t) € {0, 1, ..., mg} denotes the number of components of type k in the 


system which function at time ¢, and F(t) represents the C DF of the random failure 
times of components of the different types. In this approach, we have a strong iid 
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Fig. 7.6 Flow chart of the Monte Carlo simulation algorithm for complex systems with repairable 
components based on survival signature. Details of the simulation method are available in [30] 


assumption of failure times within same components types. With this assumption, 
all state vectors [7] are equally likely to occur. 

However, simulation methods can be applied to study and analyse any system, 
without introducing simplifications or unjustified assumptions. A Monte Carlo-based 
approach can be combined with survival signature, to estimate the reliability of 
a system in a simple and efficient way. A possible system evolution is simulated 
by generating random events (i.e. the random transition such as failure times of 
the system components) and then estimating the status of the system based on the 
survival signature (Eq. (7.2)). By counting the number of occurrences of a specific 
condition (e.g. the number of times the system is in working status), it is possible to 
estimate the survival function and reliability of the system. 

The most generally applicable Monte Carlo simulation methods adopting the sur- 
vival signature for multi-state component and repairable systems have been proposed 
in [30]. Its procedural steps are presented in Fig. 7.6. 


7.4.1 Systems with Imprecision 


The reliability analysis of complex systems requires the probabilistic characterisation 
of all the possible component transitions. This usually requires a large dataset that is 
not always available. To avoid the inclusion of subjective assumptions, imprecision 
and vagueness of the data can be treated by using imprecise probabilities that combine 
probabilistic and set theoretical components in a unified construct (see, e.g. [4, 9]). 
Randomness and imprecision are considered simultaneously but viewed separately 
at any time during the analysis and in the results [32]. 
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Imprecision can occur at component level, where the exact failure distribution is 
not known or at system level, in the form of an imprecise survival signature. The latter 
occurs when part of the system can be unknown or not disclosed. Such imprecision 
leads to bounds on the survival function of the system, providing confidence in the 
analysis, in the sense that it does not make any additional hypothesis regarding to 
the available information. When the imprecision is at the component level, a naïve 
approach, employing a double loop sampling approach where the outer loop is used 
to sample realisations of component parameters, can be used. In other words, each 
realisation defines a new probabilistic model that needs to be solved adopting the 
simulation methods proposed above, from which the envelop of the system reliability 
is identified. However, since almost all systems are coherent (a system is coherent if 
each component is relevant, and the structure function is nondecreasing), it is only 
necessary to compute the system reliability twice, using the lower and upper bounds 
for all the parameters, respectively. If the imprecision is at the system level (i.e. in 
the survival signature), the simulation strategy proposed in Fig. 7.6 can be adopted 
without additional computational cost by collecting, in two separate counters, the 
upper and lower bounds of the survival signature at each component transition, as 
illustrated in [30]. Hence, imprecision at the component and system levels can be 
considered concurrently, without additional computational costs. 


7.4.2 Case Study: Industrial Water Supply System 


An industrial water supply system consisting of 13 components, as shown in Fig. 7.7, 
is chosen as a case study, to demonstrate the capability of the survival signature 
method. The system is expected to deliver water to at least one of the two tanks 
T2 or T3 from tank T1, through a set of motor-operated pumps and valves. The 
component failure data with the corresponding distributions are provided in Table 7.2. 
The survival signature method is employed to compute the reliability of the system. 


Tank T2 Valve V3 


Valve va 


Vaive V6 


Tank T3 Valve V5 


Fig. 7.7 Industrial water supply system 
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Table 7.2 Reliability parameters of the components of the water supply system 


Component Failure rate MTTR (h) Repair rate Distribution 
(a!) (h7!) type 

T,, Tn, T3 Ay =5-107> | 24 fy = 0.0417 | Exponential 

Pi, P2, P3 Ao =3-1073 |17.4 u2 = 0.0575 | Exponential 


Vi, Vo, V3, Va, Vs, Vo, VI 43 =2-107-* |9 u3 =0.111 Exponential 


Table 7.3 Survival signature (selected parts only) for the system shown in Fig. 7.7 computed with 
approach proposed in [20] 


l h b3 ® lı h b ® 
[0, 1] y y 0 

2 1 2 1/63 3 1 2 1/21 
2 1 5 8/63 3 1 5 8/21 
2 1 7 2/9 3 1 7 2/3 
2 2 6 22/63 3 2 6 6/7 
2 3 5 8/21 3 3 5 6/7 
2 3 7 2/3 3 3 7 1 


The components of the system are categorised into three types, namely, pumps, 
tanks and valves. The survival signature is given in Table 7.3. The survival function 
of the water system is then calculated analytically as shown below: 


P(T >t) = DPD > (li, h, wha ‘a eh fem)! x 


=0 =0 h= 
C)u-er ents E, 
2 3 


The resulting survival functions without repair and with repairable components are 
shown in Fig. 7.8. 

As shown in Fig. 7.8, the results of the simulation method are in agreement with 
the analytical solution for both repairable and non-repairable components. The pro- 
posed simulation method is applicable to any distribution type, intervals or even 
probability boxes. It not only separates the system structure from its component fail- 
ure time distributions, but also doesn’t require the iid assumption between different 
component types, as illustrated in [14]. 
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Fig. 7.8 Survival function without repairable (left panel) and with repairable components (right 
panel) computed using 10000 samples and verified by the analytical solutions 


7.5 Final Remarks 


System topological complexity, component interdependencies, multi-state compo- 
nent attributes and complex maintenance strategies inhibit the application of sim- 
ple reliability engineering reasoning to systems. For systems characterised by these 
attributes, simulation-based approaches allow the realistic analysis of their reliability, 
despite the relatively higher computational costs of these approaches. This, however, 
is not a problem, with recent advancement in computing. 

The load flow simulation approach is an intuitive simulation framework that is 
applicable to binary and multi-state systems of any topology. It does not require the 
prior definition of the structure function, minimal cut sets or the minimal path sets of 
the system. Instead, it employs a linear programming algorithm and the principles of 
flow conservation to compute the flow through the system. Thus, it can model flow 
losses and implement reconfiguration requirements relatively easily. It can model 
all forms of interdependencies in realistic systems, using intuitive representations. 
These attributes render the framework intuitive and generally applicable. 

While the load flow simulation approach is optimised for multi-state systems, it 
may not be the best for binary-state systems with identical components. Since the 
survival signature is a function of the system topology only, it can be calculated only 
once and reused in multiple reliability analyses. This feature reduces the reliability 
evaluation of the system to the analysis of the failure probabilities of its components, 
which is computationally cheap. Efficient simulation methods based on system sur- 
vival signature allow the reliability analysis of complex systems without resorting to 
simplifications or approximations. 

The load flow and survival signature simulation approaches are not alternative to 
each other; instead, they can be coupled to take advantage of their unique features, 
especially for systems with multiple outputs and potentially, multi-state systems. 
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The algorithms and examples presented are available at: https://github.com/ 
cossan-working- group/SystemReliabilityBookChapter. 
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Chapter 8 A) 
Overview of Stochastic Model Updating PAE 
in Aerospace Application Under 

Uncertainty Treatment 


Sifeng Bi and Michael Beer 


Abstract This chapter presents the technique route of model updating in the pres- 
ence of imprecise probabilities. The emphasis is put on the inevitable uncertainties, 
in both numerical simulations and experimental measurements, leading the updat- 
ing methodology to be significantly extended from deterministic sense to stochas- 
tic sense. This extension requires that the model parameters are not regarded as 
unknown-but-fixed values, but random variables with uncertain distributions, i.e. the 
imprecise probabilities. The final objective of stochastic model updating is no longer 
a single model prediction with maximal fidelity to a single experiment, but rather 
the calibrated distribution coefficients allowing the model predictions to fit with 
the experimental measurements in a probabilistic point of view. The involvement 
of uncertainty within a Bayesian updating framework is achieved by developing a 
novel uncertainty quantification metric, i.e. the Bhattacharyya distance, instead of 
the typical Euclidian distance. The overall approach is demonstrated by solving the 
model updating sub-problem of the NASA uncertainty quantification challenge. The 
demonstration provides a clear comparison between performances of the Euclidian 
distance and the Bhattacharyya distance, and thus promotes a better understand- 
ing of the principle of stochastic model updating, as no longer to determine the 
unknown-but-fixed parameters, but rather to reduce the uncertainty bounds of the 
model prediction and meanwhile to guarantee the existing experimental data to be 
still enveloped within the updated uncertainty space. 
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8.1 Introduction 


Computational models of large-scale structural systems with acceptable precision, 
robustness and efficiency are critical, especially for applications where a large amount 
of experimental data is hard to be obtained such as the aerospace engineering. Model 
updating has been developed as a typical technique to reduce the discrepancy between 
the numerical simulations and the experimental measurements. Recently, it is a ten- 
dency to consider the inevitable uncertainties involved in both simulations and exper- 
iments. A better understanding of the discrepancy between them, in the background 
of uncertainty, would achieve a better outcome of model updating. 

In structural analysis, the source of uncertainties, i.e. the reason of the discrep- 
ancy between simulations and measurements, can be classified into the following 
categories: 


e Parameter uncertainty: Imprecisely known input parameters of the numerical 
model, such as material properties of novel composites, geometry sizes of complex 
components and random boundary conditions; 

e Modelling uncertainty: Unavoidable simplifications and idealisations, such as lin- 
earised representations of nonlinear behaviours and frictionless joint approxima- 
tions; 

e Experimental uncertainty: Hard-to-control random effects, such as environment 
noise, measurement system errors and human subjective judgments. 


The above uncertainties motivate the trend to extend model updating from the 
deterministic sense to the stochastic sense. The stochastic updating techniques draw 
massive attention in the literature, in which the majority is based on the framework of 
imprecise probability [4]. Considering the very typical categorisation of uncertain- 
ties, the term “imprecise probability” can be understood separately as “probability” 
corresponding to the aleatory uncertainty, and “imprecise” corresponding to the epis- 
temic uncertainty. The epistemic uncertainty is caused by the lack of knowledge. As 
a better understanding of the problem is achieved, this part of uncertainty can be 
reduced by model updating. The aleatory uncertainty represents the natural random- 
ness of the system, such as the random wind load on launch vehicles, manufacture 
and measurement system errors. This part of uncertainty is irreducible, however, an 
appropriate quantification of the aleatory uncertainty is still required in stochastic 
model updating. 

The involvement of aleatory and/or epistemic uncertainties provides a clear logic 
for the categorisation of model input parameters: 


I Parameters without any aleatory or epistemic uncertainty, appearing as explicitly 
determined constants; 
II Parameters with only epistemic uncertainty, which are still constants, however, 
with an undetermined position in an interval; 
HI Parameters with only aleatory uncertainty, which are no longer constants, but 
presented as random variables with exactly known distribution characteristics; 
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IV Parameters with both aleatory and epistemic uncertainties, which are random 
variables with undetermined distribution characteristics. 


The above parameters with various uncertainty characteristics lead the model 
predictions into the fourth category, i.e. the outputs with imprecise probabilities. 
Consequently, the objective of stochastic model updating is no longer a single model 
prediction with maximal fidelity to a single experiment, but rather a minimised uncer- 
tainty space of the outputs, whose bounds should still encompass the existing exper- 
imental data. In order to achieve this objective, a novel Uncertainty Quantification 
(UQ) metric based on the Bhattacharyya distance is introduced in this chapter. The 
UQ metric refers to an explicit value quantifying the discrepancy between simulations 
and measurements. Clearly, this metric is expected to be as comprehensive as pos- 
sible to capture all sources of uncertainty information simultaneously. Furthermore, 
the overall updating procedure is committed to being simple enough, by employing 
a uniform framework applicable to both of the classical Euclidian distance and the 
novel Bhattacharyya distance. Within this uniform framework, comparison between 
these two distance-based metrics can be performed conveniently, and therefore a bet- 
ter understanding of the difference between the deterministic and stochastic updating 
is achieved with significantly reduced calculation cost. 

The following parts of this chapter are organised as follows. Section 8.2 gives 
an overview of the state of the art of deterministic and stochastic model updating 
where key literature is provided. Section 8.3 presents the overall technique route with 
description of the key aspects, which can be helpful to generate a preliminary figure 
of the overall model updating campaign. As the emphasis, the parameter calibra- 
tion procedure with uncertainty treatment is explained in Sect. 8.4, where the Bhat- 
tacharyya distance-based UQ metric is presented along with the Bayesian updating 
framework. The NASA UQ challenge problem is solved by the proposed approach 
and some interesting results are compared and analysed in Sect. 8.5. Section 8.6 
presents the conclusions and prospects. 


8.2 Overview of the State of the Art: Deterministic 
or Stochastic? 


Although this chapter focuses on the stochastic model updating, the deterministic 
updating is still the footstone of its stochastic extension. A comprehensive review of 
the deterministic model updating techniques can be found from Ref. [14]. The readers 
are also suggested to refer to the fundamental book by Friswell and Mottershead 
[15] on this subject covering key aspects including model preparation, vibration data 
acquisition, sensitivity analysis, error localisation, parameter calibration, etc. 
Among the plentiful techniques for deterministic parameter calibration, the 
sensitivity-based method is one of the most popular approaches based on the lin- 
earisation of the generally nonlinear relation between the model inputs and outputs. 
Mottershead et al. [20] provide tutorial literature for the sensitivity-based updat- 
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ing procedure of finite element models with both demonstrative and industry-scale 
examples. However, the sensitivity-based method is only valid for typical outputs, 
e.g. natural frequencies, modal shapes and displacement, of modal analysis or static 
analysis. Other more complex applications such as strong nonlinear dynamics or 
transient analysis lead the analytically solved sensitivity to be unpractical. Conse- 
quently, the random sampling method, more specifically the Monte Carlo method, 
attracts more and more interest by providing a direct connection between the model 
parameters and any output features via multiple deterministic model evaluations. 
The rapid development of computational hardware makes it possible for large-size 
samples, from which the statistical information of the inputs/outputs can be precisely 
estimated [17, 18]. Such Monte Carlo-based methods have been successfully applied 
in large-scale structures, see e.g. Refs. [10, 16]. 

The widely used Monte Carlo methods obviously benefit to the research of 
stochastic model updating, meanwhile, its conjunction with the Bayesian theory 
further promotes this topic. Beck and Katafygiotis [3] proposed the fundamental 
framework of Bayesian updating, which was further developed via the Markov Chain 
Monte Carlo (MCMC) algorithm by Beck and Au [2]. The Bayesian updating frame- 
work in conjunction with the MCMC algorithm possesses the advantage to capture 
the uncertainty information presented by rare experimental data. This approach has 
been developed as a standard solution of stochastic model updating for different 
applications such as uncertainty characterisation [22]. 

Besides the Bayesian interface, other imprecise probability techniques also have 
considerable potential to be applied in stochastic model updating, such as interval 
probabilities [13], evidence theory [21], info-gap theory [5] and fuzzy probabilities 
[19]. For the background of imprecise probability, the comprehensive review by Beer 
et al. [4] is suggested for an overall understanding of this topic. 


8.3 Overall Technique Route of Stochastic Model Updating 


The implement of stochastic model updating requires a complete theoretic system 
including various key steps from the originally developed model, with a series of 
techniques to define the features, to select and calibrate the parameters, to locate 
and reduce the modelling errors, and finally to validate the model with indepen- 
dent measurements. Special treatment of uncertainty propagation and quantification 
promotes the extension of model updating from deterministic domain to stochastic 
domain. This extension is specifically implemented to key steps such as parameter 
calibration, model adjustment and validation. Overview of all related steps in model 
updating is provided in the following subsections. 
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8.3.1 Feature Extraction 


A numerical model cannot be universally feasible for all scenarios with different 
output features. Here, the feature is defined as the quantity that the engineer wants to 
predict with the model and it is also dependent on the capacity of the practical exper- 
imental set-up. Clearly, different features lead to different sensitivities of parameters, 
and require different strategies for uncertainty quantification. And therefore the first 
step of model updating is to define a suitable feature according to the existing exper- 
imental setup and the practical application. The most typical features in structural 
updating are the modal quantities, e.g. the natural frequencies and modal shapes. Mul- 
tiple orders of frequencies constitute a vector and the absolute mean error between 
two vectors can be easily utilised to quantify the discrepancy between the simulated 
and measured data. For modal shapes, the Modal Assurance Criterion (MAC) [1] 
is the most popular tool to quantify the correlation between two sets of eigenvec- 
tors. The continuous quantities are also commonly utilised as features, such as the 
displacement response in time domain and Frequency Response Functions (FRFs) 
in frequency domain. Classical techniques to evaluate the difference between two 
complex and continuous quantities are the Signature Assurance Criterion (SAC) and 
Cross Signature scale Factor (CSF). Reference [8] provides an integrated application 
of SAC and CSF for a comprehensive comparison between two FRFs. 


8.3.2 Parameter Selection 


A sophisticated model of a large structure system always contains a massive number 
of parameters, which lead to a huge calculation burden and even failure of the updat- 
ing procedure. Parameter selection is therefore a key step to select or filter parameters 
according to their significances to the features defined in the first step. The core tech- 
nique for parameter selection is the sensitivity analysis focusing on a quantitative 
measurement of the parameter significance. The classical sensitivity analysis tech- 
nique is the Sobol variance-based method [25]. For a comprehensive knowledge of 
the global sensitivity analysis inspired by Sobol’s method, the well-written book by 
Saltelli et al. [24] is suggested to the readers. Another extension of Sobol method 
includes, e.g. the Analysis of Variance (ANOVA) based on the hypothesis testing in 
probabilistic theory. Reference [7] proposes an integrated application of ANOVA and 
Design of Experiment (DoE), which provides a significant coefficient matrix con- 
taining the complete sensitivity information of a multiple parameter—multiple feature 
system. When uncertainties are involved, the sensitivity analysis requires extension 
from the deterministic procedure to the stochastic procedure. The sensitivity of a 
certain parameter, in the background of uncertainty treatment, can be represented as 
the degree of how much the uncertainty space of the features is reduced, when the 
epistemic uncertainty space of the parameter is completely reduced. This requires 
additional techniques for uncertainty propagation and quantification, which will be 
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addressed in the following parameter calibration step. A comprehensive literature 
review on the subject of sensitivity analysis can be found in Ref. [26]. 


8.3.3 Surrogate Modelling 


The employment of surrogate models, known as mate-models in some literature, 
becomes increasingly important along with the sampling-based updating method- 
ologies where a large number of model evaluations are generally required. A surro- 
gate model is a fast-running script between inputs and outputs, which can replace 
the time-consuming model, e.g. the large finite element model, in the updating pro- 
cedure. An original input/output sample, i.e. a training sample, generated from the 
complex model is required to train the surrogate model. Since the surrogate model is 
proposed to handle the conflict between efficiency and precision, the training sample 
is expected to be as small as possible, while the precision of the surrogate model 
according to the original model should be high enough. The typical types of surro- 
gate models include the polynomial function, radial basis function, support vector 
machine, Kriging function neural network, etc. The selection of a suitable surrogate 
model type is determined by various aspects such as its efficiency, generality, and 
nonlinearity. Another technique, in conjunction with the surrogate modelling, is the 
DoE with the purpose to efficiently and uniformly configure a spatial distribution of 
the training sample within the whole parameter space. A comprehensive review of 
the existing techniques of surrogate modelling and DoE can be found in Ref. [27]. 


8.3.4 Test Analysis Correlation: Uncertainty Quantification 
Metrics 


Test Analysis Correlation (TAC) is the core step of the overall updating procedure, 
not only because it significantly influences the updating outcome but also because 
it is the part mostly extended by the uncertainty treatment. TAC refers to the pro- 
cess to quantitatively measure the agreement (or lack thereof) between test mea- 
surements and analytical simulations, taking uncertainties into account. It therefore 
requires a comprehensive metric which is capable of capturing multiple uncertainty 
sources simultaneously. This chapter proposes UQ metrics under various distance 
concepts. The Euclidian distance, i.e. the absolute geometry distance between two 
single points, is probably the most common metric used in deterministic updating 
approaches. However, it becomes insufficient for stochastic updating where mul- 
tiple simulations and multiple tests are presented. The Mahalanobis distance is a 
weighted distance considering the covariance between two datasets. And the alter- 
native Bhattacharyya distance is a statistical distance measuring the overlap between 
two random distributions. A comprehensive comparison among the three distances 
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in model updating and validation can be found in Ref. [9], where the Bhattacharyya 
distance is found to be more comprehensive to capture more sources of uncertainty 
information. In the example section, the Bhattacharyya distance-based UQ metric 
is applied within a Bayesian updating framework, and the result is compared with 
the one using the typical Euclidian distance. This work does not address the Maha- 
lanobis distance, since it has been found to be infeasible for parameter calibration, 
although it contains the covariance information among multiple outputs. Neverthe- 
less, the Mahalanobis distance has the potential to be utilised in model validation as 
demonstrated in Ref. [9]. 


8.3.5 Model Adjustment and Validation 


After the model parameters are calibrated, the model still needs to pass the validation 
procedure before it can be utilised in a practical application. The validation procedure 
contains a series of criteria with increasing requirements: (1) The updated model 
should predict the existing measurements; (2) The updated model should predict 
another set of measurement data which is different from the ones used for updating; 
(3) The updated model should predict any modification of the physical system by 
making the same modification on the model; (4) The updated model, when utilised as 
a component of a whole system, should improve the prediction of the whole system 
model. In the background of uncertainty treatment, the fit between the prediction 
and the measurement should be assessed by not only the precision but also the 
stochastic characteristics, e.g. probabilities, intervals and probability boxes. Model 
adjustment is a procedure to deal with the modelling uncertainty. When the updated 
model fails to fulfil the validation criteria, or some updated parameters are found to 
be unphysical, e.g. a minus density value, the modelling uncertainty is too severe 
to be compensated by calibrating the parameters. The model can be adjusted by 
increasing the resolution, changing the element type and adding a more detailed 
geometry description, etc. Another round of parameter calibration is performed for 
the adjusted model until the validation criterion is found to be fulfilled. 


8.4 Uncertainty Treatment in Parameter Calibration 


8.4.1 The Bayesian Updating Framework 


The typical Bayesian model updating methodology is based on the following Bayes’ 
equation: 
Pr (XexplO)P@) 


P(O0|Xexp) = PX) 
exp 


(8.1) 
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with the key elements described as follows: 


P(@) is the prior distribution of the parameters, representing the prior knowledge 
before model updating; 

P(@|X_xp) is the posterior distribution of the parameters conditional to the existing 
measurement, i.e. P(@|X_,,) is the outcome of Bayesian updating; 

P (Xexp) is the so-called “normalisation factor” guaranteeing the integration of the 
posterior distribution P(O|X_,,,) equal to one; 

Pr (Xexpl0) is the likelihood function defined as the probability of the existing 
measurements conditional to an instance of the parameters. 


The likelihood represents the probability of the measurement data under each 
instance of the updating parameters @. And thus the objective of model updating in the 
Bayesian background is expressed as: to find the specific instance of the parameters 
allowing the experimental measurement to possess the largest probability, in other 
words, allowing the likelihood Pz (Xexp|0) reach the maximum. See Chap. | for 
additional background on Bayesian inference. 

However, one of the difficulties in Bayesian updating is relative to the normal- 
isation fact P(X_.,,). The direct integration of the posterior distribution over the 
whole parameter space is quite difficult in practical application, especially when the 
number of parameters is large and the distribution format is complex, leading the 
direct evaluation of the normalisation factor impractical. The well-known MCMC 
algorithm is popular to solve this difficulty by replacing Eq. (8.1) with 


P(O|Xexp) © Pr(Xexpl0)° PO) (8.2) 


where £ is the weighting coefficient fallen within the interval [0, 1]. When 6 equals 
to zero, the right part of Eq. (8.2) is the prior distribution; when £ equals to one, the 
right part of Eq. (8.2) converges to the posterior distribution. In the j-th iteration of the 
MCMC algorithm, random samples are generated from the intermediate distributions 
with weighting coefficient £; € [1, 0]. In the (j + 1)-th iteration, parameter points 
which lead to higher likelihood are selected from the random samples in the j-th 
iteration. New Markov chains are generated using the selected parameter points, 
and thus j+ is updated. The intermediate distribution converges to the posterior 
distribution when 6; = 1. The MCMC algorithm has been developed as a standard 
tool to stepwise generate samples from very complex target distributions. For the 
detailed description of the MCMC algorithm, Refs. [2, 11] are suggested to the 
readers. More applications of this algorithm can be found in the fields from stochastic 
model updating [6, 22] to structural health monitoring [23]. See Chap. 2 for additional 
background on Monte Carlo methods. 
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8.4.2 A Novel Uncertainty Quantification Metric 


The MCMC algorithm makes it possible to avoid the evaluation of the normalisation 
fact in Bayes’ equation. However, the evaluation of the likelihood is inevitable and 
it becomes even more critical along with the uncertainty treatment. In the presence 
of multiple sets of measurement data, the theoretical definition of the likelihood is 


Nexp 


Py (Xexpl9) = | | PO«l9) (8.3) 
k=1 


where Nexp is the number of existing experiments. This equation requires accurate 
knowledge of the distribution of each measurement data point P(x;|0). An accurate 
estimation of the distribution requires a large number of samples, which means a 
large number of model evaluations. 

Clearly, a direct evaluation of Eq. (8.3) leads to a huge calculation cost. This is 
why the Approximate Bayesian Computation (ABC) becomes increasingly popular 
in Bayesian applications. Considering the principle of the likelihood function, it 
is natural to propose an approximate function to replace Eq. (8.3), as long as this 
approximate function still contains the information of the existing measurement data 
and an instance of updating parameters. In this work, an approximate likelihood 
based on the Gaussian function is proposed as 


PL (Xexpl9) S 


42 
d(Xexp, Xsim) | (8.4) 


1 
ex 
oJ/2n P| 20? 


where d (Xexp, Xsim) is the distance between the experimental and simulated feature 
data. Equation (8.4) serves as an elegant connection between the Bayesian updating 
framework and the distance concepts. It provides a uniform framework for various 
distance concepts. The approximate likelihood is applicable for not only the Euclidian 
distance but also for the Mahalanobis and Bhattacharyya distances, in a uniform 
updating framework. As explained in Sect. 8.3.4, the Mahalanobis distance is not 
utilised in this work. The Euclidian distance is evaluated as 


lida 


dg (Xexp, Xsim) = [ (Kexp bz Xsim)(Xexp = Xam) (8.5) 


where X,,, and Xsim are the experimental and simulated feature data matrices, respec- 


tively; X denotes the mean vector of the matrix. Clearly, the Euclidian distance only 
handles the mean of the data. The Bhattacharyya distance has the definition as 


dg (Xop, Xsim) == log E Vv Pexp (x) Psim Gas] (8.6) 
Y 
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where P (x) is the PDF of the feature; X is the feature space, implying te is the inte- 
gration performed over the whole feature variable space. More detailed information 
about the evaluation method of the Bhattacharyya distance can be found in Ref. [6]. 


8.5 Example: The NASA UQ Challenge 


The NASA UQ challenge [12] has been developed as a benchmark problem for 
uncertainty treatment techniques in multidisciplinary engineering. This challenge 
problem contains a series of subproblems such as uncertainty characterisation, sen- 
sitivity analysis, uncertainty propagation, etc. In this work, only the Subproblem 
A (Uncertainty Characterisation) is investigated since it is equivalent to the task of 
model updating herein. Figure 8.1 illustrates the key components of this problem, 
which contains one output evaluated via a black-box model using five input param- 
eters. According to the categorisation strategy in Sect. 8.1, the five parameters are 
classified into three categories, as shown in Table 8.1. 

Only the parameters involving epistemic uncertainties, i.e. Categories II and IV 
parameters, require to be updated in this context. The uncertainty characteristics, 
including distribution types and distribution coefficients, are predefined as listed 
in Table 8.1. The predefined intervals of the distribution coefficients represent the 
epistemic uncertainty of the parameters. An output sample with 50 data points is 
available in the problem, which is generated by assigning a set of “true” values of the 
distribution coefficients from the predefined intervals. The objective of this problem 
is, based on the existing 50 output points, to reduce the epistemic uncertainty space of 
the parameters, i.e. to reduce the predefined intervals of the distribution coefficients. 
Although the number of the model parameter is five, there are totally eight updating 
coefficients controlling the epistemic uncertainty space of the parameters, as shown 
in the last a column of Table 8.1. 

To solve the problem, the Euclidian and Bhattacharyya distances are utilised to 
construct the approximate likelihood functions, respectively. The Bayesian updating 


Stochastic 
a. | model updating 


Parameters | 
Category III: Black-box model 
P3 x=h(p,) i=1,....5 


Category IV: 
Pis Pa Ps 


Xoy 


Category II: 


P2 


Fig. 8.1 Key components of the Subproblem A in the NASA UQ challenge 
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Table 8.1 The uncertainty characteristics of the parameters 


Parameters Categories Uncertainty Updating coefficients 
characteristics 

Pl IV Unimodal Beta, 0, = ui, bh = o? 
nı €[0.6, 0.8], 
o? €[0.02, 0.04] 

p2 I Constant, p2 € [0.0, | 63 = p2 
1.0] 

P3 Ill Uniform, u3 = 0.5, No updating required 
o2? = 1/12 

P4, Ps IV Joint Gaussian, 04 = u4, 05 = of 
ui €[—5.0, 5.0], 86 = u5, 07 = ož, 
o? €[0.0025, 4.0], 63 =p 
pe[-1.0, 1.0],i = 4,5 


framework employs these two distances as UQ metrics and generates two indepen- 
dent sets of results. This treatment is intended to make a clear distinction between 
the deterministic updating and the stochastic updating, and furthermore to reveal 
the merits and demerits of these two distances. In practical applications, however, a 
combined application of these two distances is suggested by first using the Euclidian 
distance for the mean updating, and second using the Bhattacharyya distance for 
the variance updating. This two-step strategy has been demonstrated as a success in 
solving this problem in Ref. [6]. 

The posterior distributions of the eight updating coefficients using respectively the 
Euclidian and Bhattacharyya distances as the UQ metrics are illustrated in Fig. 8.2. 
In the figure, the objects (“Samples” or “PDFs”) with the suffix “_ED” denote the 
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Fig. 8.2 Posterior distributions updated using the Euclidian and Bhattacharyya distances 
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results updated using the Euclidian distance metric; and the ones with the suffix 
“BD” denote the results using the Bhattacharyya distance metric. Except the distri- 
bution of 01, most of the posterior distributions with the Euclidian distance metric are 
still close to uniform, implying the deterministic updating procedure employing the 
Euclidian distance is incapable of solving this problem. As an obvious comparison, 
the updating procedure employing the Bhattacharyya distance performs well for most 
of the updating coefficients by providing very peaked posterior distributions, such as 
02, 03, 06, and 67. The vertical line in each subfigure represents the true value of the 
updating coefficient, which were used to generate the existing 50 output samples. 
By comparing the position of the vertical line with the peak of the posterior distribu- 
tion, the updating precision is assessed. For 6), the peak of the distribution with the 
Euclidian distance is apart from the vertical line, while the peak of the distribution 
with the Bhattacharyya distance is quite close to the vertical line. This means the 
Bhattacharyya distance performs better than the Euclidian distance when updating 
6,. The same conclusion is also achieved for 63, 64, and 67. However, for 62, and 65, 
although the distributions with the Bhattacharyya distance have clear poles, they do 
not converge to the vertical lines. The updating precision of these two coefficients is 
possible to be further improved by the two-step strategy as described in Ref. [6]. Note 
that, there are still two coefficients (05 and 6g), which cannot be calibrated by neither 
the Euclidian distance nor the Bhattacharyya distance. A potential explanation is that 
the sensitivity of these two coefficients to the output is extremely low, leading the 
inverse procedure impossible to locate the “true” value. 

The quantitative updating results are detailed in Table 8.2, where the true values 
of the updating coefficients and the updated ones are provided. The updated values 
in the last two columns of Table 8.2 are obtained by determining the exact posi- 
tion of the distribution peaks in Fig. 8.2. For the posterior distributions which are 
still close to uniform distribution, the determining process is meaningless and thus 
omitted. This is why only two updated coefficients are provided in the column with 
the Euclidian distance. Note that, although the true values of the coefficients are 
released in Table 8.2, they are not necessarily to be treated as the final target of this 
updating problem. In the background of uncertainty treatment, the objective of this 
problem, stipulated by the problem designer [12], is to reduce the epistemic uncer- 
tainty space of the parameters, while making sure that the existing output sample can 
still be included in the output uncertainty space. Hence, it makes more sense to assess 
how much the output uncertainty space has been reduced after the intervals of the 
coefficients are reduced in the updating procedure. This is relative to the tasks uncer- 
tainty propagation and model validation, which are out of the scope of this example 
for parameter calibration. A complete validation procedure considering the uncer- 
tainty space of the output for this problem can be found in Ref. [6]. Nevertheless, 
the stochastic Bayesian updating framework employing the Bhattacharyya distance 
has been demonstrated to be more comprehensive and feasible than the Euclidian 
distance in solving this NASA UQ challenge problem. 
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Table 8.2 The updated results using the Euclidian and Bhattacharyya distances 

Updating coefficients |Predefined intervals |True values? Updated results 

Euclidian Bhattacharyya 

01: A 0.6, 0.8] 0.6364 0.6990 0.6194 

02: oa} [0.02, 0.04] 0.0356 = 0.0397 

03: p2 0, 1] 1.0 Z 0.9855 

84: M4 —5, 5] 4.0 = 3.9770 

05: of 0.0025, 4] 0.04 - - 

06: u5 =5;,.5] -15 = —4.4732 

07: oe 0.0025, 4] 0.36 = 0.2818 

Og: p —1, 1] 0.5 —0.3471 0.1804 


*Data available online at https://uqtools.larc.nasa.gov/nda-uq-challenge-problem-2014 [retrieved 
in 2017] 


8.6 Conclusions and Prospects 


The tendency of uncertainty analysis has been rendering the typical model updat- 
ing full of vitalities but also challenges. This chapter reviews the key techniques 
and components of the overall model updating campaign. Main emphasis is put on 
the involvement of uncertainty, which leads the transformation from the determinis- 
tic approach to the stochastic approach. The stochastic model updating is executed 
within the Bayesian model updating framework, where the Bhattacharyya distance is 
proposed as a novel UQ metric. The approximate likelihood is critical by providing a 
uniform connection between the Bayesian framework and various types of distance 
metrics. The example demonstrates that the Bhattacharyya distance is more compre- 
hensive and feasible than the Euclidian distance to calibrate distribution coefficients 
of parameters with imprecise probabilities. 

The Bhattacharyya distance is designed as a universal tool of UQ, which can be 
conveniently embedded into a technique route similar as the deterministic approach, 
but provides stochastic outcomes by capturing more uncertainty information. The 
tendency of uncertainty analysis will be further promoted by the novel UQ metric 
in not only the stochastic parameter calibration but also other procedures, e.g. the 
stochastic sensitivity analysis and model validation, which will establish the complete 
scenario of the stochastic model updating. 
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Chapter 9 A) 
Aerospace Flight Modeling get 
and Experimental Testing 


Olivier Chazot 


Abstract Validation processes for aerospace flight modeling require to articulate 
uncertainty quantification methods with the experimental approach. On this note, 
the specific strategies for the reproduction of re-entry flow conditions in ground- 
based facilities are reviewed. It shows how it combines high-speed flow physics with 
the hypersonic wind tunnel capabilities. 


9.1 Introduction 


Space missions are built upon massive technology knowledge and on the latest 
progress in engineering. They fascinate as they represent for the public the most 
advanced knowledge as well as a typical dive into the unknown. For scientists and 
engineers, such missions are an occasion to push the scientific knowledge and to 
establish better what we know to offer solid basis for further discoveries. 

In practice, space exploration leads to extreme challenges as it aims to investigate 
planets, or asteroids, in the solar system and return samples for analysis across very 
severe flight conditions. Such missions need to be designed using physical models 
and robust numerical methods. However, those tools used by aerospace engineers 
remain, for most of them, on the need for more research development for their 
validation and consolidation to be able to plan successful and fruitful missions. 

Aerothermodynamic testing is one of the crucial points for the design of aerospace 
vehicles. At first place, it aims at establishing as much as knowledge possible on 
critical flight phenomena. Ground-based facilities are operated to reproduce flight- 
relevant environment for the testing of the vehicle configuration and its Thermal 
Protection System (TPS) to allow for an accurate evaluation of their performances. 
Two types of facilities are classically used for the ground testing to support the 
pre-flight analysis. First, the required flow-field is reproduced for its analysis in 
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high-enthalpy facilities such as shock tunnels or expansion tubes. Then in a second 
step, the interaction between the dissociated gas and the vehicle’ surface is studied 
to determine the thickness of Thermal Protection Material (TPM) from databases 
built in plasma wind tunnels. Accurate flight duplication is thus necessary in order 
to properly address those stringent requirements without over-sizing the TPS. 

All the data produced are processed to define at best the modeling for all the 
required physical phenomena. Very performant frameworks for such processing are 
the Uncertainty Quantification (UQ) methods. Bayesian approach, in particular, are 
extremely powerful as they allow to determine the required information from a limited 
experimental knowledge with a probabilistic treatment. However, such key results, 
for hypersonic flight, are not only due to powerful mathematical treatment but also 
thanks to consistent experimental methodologies. This combination between mathe- 
matical and experimental approach is essential to generate useful knowledge on the 
physical modeling to be determined. Then it is primordial to have a good understand- 
ing on how the ground testing are linked to the fundamental mathematical models 
used for simulating the high-speed flow physics. It serves to setup the best experi- 
mental environment to allow a fruitful processing of the acquired data. 

Therefore, this note intends to review the rationale of the ground testing method- 
ology for aerospace vehicle. It offers a synthesis on the high-speed ground testing 
underlying the links to their scientific basis. It presents how similitude laws could 
be applied or need to be adapted as more and more physical phenomena have to 
be considered for aerospace flights. It would help engineers and applied mathemati- 
cians for working together facing the challenges of high-speed flight investigations 
for aerospace development. 


9.2 Aerospace Flights and Planetary Re-entry 


Aerospace flight can be considered for very different trajectories. It could follow a 
planet orbit up to the limit of its escape velocity, but it could also correspond to an 
interplanetary transfer with super-orbital velocities. Those situations involve a large 
variety of kinetic energy and trajectories. Figure 9.1 illustrates those typical orbit 
trajectories, around Earth, with their corresponding velocities. 

Super-orbital atmospheric re-entry, also known as hyperbolic re-entry, is charac- 
terized by high velocities and is encountered when a probe is entering an atmosphere 
from a hyperbolic orbit rather than from an elliptical orbit. It is the case for some 
interplanetary probes or sample return missions. Typical velocities for probes enter- 
ing the Earth’s atmosphere from hyperbolic orbits scale from 11 to 14 km/s, which 
correspond to specific enthalpies between 60 and 100 MJ/kg. This is considerably 
higher than the usual re-entry velocities from circular or elliptic orbits, e.g., 8.2 km/s 
for the Space Shuttle. Up to now, the Stardust probe was the fastest artificial object 
to perform a controlled re-entry in the Earth’s atmosphere, at 12.8 km/s. 

The environment of a high-speed re-entry flight is much more severe compare to a 
Low Earth Orbit (LEO) entry with particularly high heat flux and important stagnation 
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pressure. In addition to the classical features of re-entry flows like non-equilibrium 
thermo-chemistry and complex gas-surface interactions some specific phenomena 
become important as shock layer radiative heating and ablation phenomena. Those 
physical process were not considered much at smaller velocity but they cannot be 
anymore neglected for super-orbital re-entry. In top of this, more complex physical 
reality coupling phenomena appear in the flow between radiation and ablation that 
lead to a very intricate situation. It is therefore not possible to extrapolate what has 
been studied and learned for orbital re-entry to super-orbital conditions and specific 
ground testing strategies are required. 


9.3 Similitude Approach for Hypersonic Flows 


Similitude in classical fluid dynamics establishes a correspondence between different 
flows, based on the mathematical model representing these flows, without having to 
solve the set of equations. This correspondence then can be used to relate two real 
physical flow situations or to relate a family of solutions for the model. 

With two correlated applications: 


e The flow fields are similar (i.e., solution of the same set of equations) even if the 
dimensions and the temporal evolution of the phenomena are on different scales. 

e When applicable, the use of the similarity laws allows to replicate in wind tunnels 
the flow field occurring in flight around a re-entry vehicle. 


This second aspect of the similitude approach is mostly useful since it allows to study 
on ground typical re-entry situations. Hypersonic flows are particularly interesting 
for the application of similitude approach because it exists many different situa- 
tions in hypersonic regime which can be describes with a variety of models leading 
to different similitude laws. On one hand, it gives opportunities to develop sim- 
plify solutions, but on the other hand, the physical nature of the high-speed flows, 
mostly due to the high-temperature effects, severely restrict exact similitude and 
impose to study approximate similitude. This family of flows is essentially deter- 
mined by the mathematical model chosen to describe the flows and of which the 
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flows are themselves solutions. It appears that the similitude strategy evolves as the 
model integrates more and more physical aspects for the description of the flow. In 
hypersonics going from inviscid regime to high-temperature effects, the similitude 
approach will be adapted to retain the most relevant flow parameters corresponding 
to each case. Similitude in hypersonic has been studied and discussed by different 
authors: in earlier time, by Hayes and Probstein for inviscid and viscous hypersonics 
[11, 12], by Freeman for dissociated gases [9] and in a more general form by Viviand 
for CFD development [6], but it provides also very useful material for experimental 
studies. The most common approaches are briefly presented below, as well as their 
limitations. 


9.3.1 Inviscid Hypersonics 


At high Mach number when considering a slender body (small thickness ratio 
t << 1) at small angle of attack ( œ << 1) in a perfect gas, the governing Euler 
equations can be further simplified using the small disturbances theory. The similar- 
ity parameters are then Mt, at, and y. The slenderness ratio t is defined as t = d/l, 
where d is the body’s radius and 1 its length. The parameter K = MrT is called the 
hypersonic similarity parameter [11]. On these conditions, with a constant isentropic 
exponent y, the results of the similitude may be expressed in dimensionless form. 
For family of affinely related bodies of thickness ratio, t in two-dimensional flows 
the pressure coefficient C, could be written: 


C y+1 y+1\ 1 
4 =2 a ( J ) + =f(K,y) (9.1) 


The typical testing strategy in this inviscid framework could be represented as in 
the figure (Fig. 9.2). 

That approach holds for inviscid hypersonic, over a wide range of K for slender 
bodies as long as the Mach number in the flow is large enough and t small. Never- 
theless, it does not correlate well as the body thickness is increased, curved shock 
and boundary layers start to be predominant in the flow. As most hypersonic vehicles 
are blunt rather than slender for thermal considerations this hypersonic similarity has 
limited applications. 


9.3.2 Viscous Hypersonics 


Hypersonic flows present thick boundary layers that are also growing fast as Mach 
number is increased. These boundary layers cannot be ignored generally in hyper- 
sonic problems as they determine the major feature of the flow physics. They also 
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Fig. 9.2 Inviscid hypersonic similitude 


need to be taking into account for their interaction with the inviscid flow; ground- 
based facilities need to be designed toward those considerations to provide a way for 
studying these effects in relevant hypersonic flight conditions. To this end, similitude 
approach is a precious guide to identify which combination of flow parameters need 
to be taken into account in experimental simulation. Viscous hypersonic similitude 
has been presented first by Hayes and Probstein in a paper [12] where they review 
the general features of viscous similitude at high speed. The inviscid flow need to 
be represented, then the hypersonic similarity parameter K = Mr and y need to be 
invariant. The interaction of the viscous part of the flow with the inviscid field will 
be determined by the viscous-inviscid interaction parameter x expressed as 


M3,/C 
x= 200 N00. (9.2) 
V Rex, © 


with Coy = 4% and Reso = =. 

With these conditions the perfect gas model has to be assumed and the continuum 
hypothesis valid, i.e., the mean free path is at least one order of magnitude smaller 
than the characteristic length of the flow. The viscous hypersonic similarity requires 
reproducing the free-stream Reynolds (Re) and Mach number (M) and the tempera- 
ture ratio ae where the subscript w indicates the wall temperature and the oo symbol 
in the subscript refers to the free-stream. If the gas mixture is not the same, the heat 
capacity ratio y also needs to be reproduced [13]. This eases facility development, 
as lowering the gas temperature lowers the speed of sound, and thereby increases 


the achievable free-stream Mach number. The condensation temperature of the test 
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Fig. 9.3 Schematic representation of cold hypersonic testing in ground facility 


gas imposes the upper limit of what is achievable in a given facility. In a general 
view it resorts a Mach—Reynold simulation. Most of the ground testing capabilities 
are designed following this principle working with scaled models (Fig. 9.3). Their 
operation envelope are commonly pe in a Ma-Re graph that indicates which 
flight domain can be simulated (Fig. 9.4).! One has to remark that the temperature 
ratio parameter = Te is not often taking into account but it could be an important aspect 
to consider in around testing as it could appear as a limitation in the study. 


9.3.3 High-Temperature Hypersonics 


When a real gas, as air, experiences a strong shock at high speed, it will increase 
tremendously its thermal energy. In such conditions, the molecular collisions are 
energetic enough to cause dissociation and ionization and the gas to depart from 
the perfect gas model. For a blunt body at a velocity of 7 km/s, the temperature 
immediately after the shock is around 14,000 K, and around 8,000 K downstream 
the shock, where the flow may return to equilibrium. At such high temperatures, 
chemical effects have to be taken into account. For air at a pressure of 1 atm, vibra- 
tional excitation begins at 800 K, O2 begins to dissociate at 2,500 K and is fully 
dissociated for 4,000 K, point for which N> begins to dissociate. At 9,000 K, N3 
is fully dissociated and ionization begins. One can easily understand that the flow 
downstream the shock becomes a plasma: molecules are dissociated and atoms are 
partially ionized. The parameters to be respected for a flow involving chemistry are 
V?/2D, where V is the free-stream velocity and D the typical dissociation energy 
of the gas molecule considered, the Damköhler number Da, defined as Da = L/Ip, 


' Graphic extracted from AGARD AR 319. 
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where L is the characteristic length of the flow and /p the characteristic length asso- 
ciated to the dissociation reaction, and the temperature ratio T,,/T [13]. It should 
be brought to the reader’s attention that the gas behind the shock is a chemically 
reacting mixture of perfect gases, and not a real gas as it is sometime incorrectly 
referred to in literature. 

The Damkohler number for the gas appears considering the mass conservation 
equation in a non-dimensional form as it is expressed in the equation below: 


N, N, N, 
api i ” d ` v. : T 
TE + VV + Ji) = Yip — Vp) {Daye [ [0 —Daow TT 0)" } e3 
r=1 j=l j=l 
k fr Pool. kow PoL 
Day, = K fr Pooh Dap, = Kbw Poo (9.4) 
Ux U% 


Chemical reactions take a very short but finite time to happen. Assuming that 
the flow is composed of a single species, the dissociation rate is proportional to 
the density, while the recombination rate is proportional to the square of density. 
Hence, the characteristic lengths associated to the dissociation and recombination 
reactions, respectively, scale as 1p « 1/p and lp « 1/p, where p is the flow den- 
sity [13]. The recombination length is usually larger than the dissociation length. 
Under certain conditions, at very high altitude, one can assume that /p ~ L, and 
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therefore Da = o(1), while lę is much larger. The flow is frozen; it is too fast for 
recombination reactions to take place. In that case, the binary scaling parameter pL 
must be reproduced in order to obtain the correct Da [16]. One could note that 
the diffusion phenomena is also considered in this analysis, as the diffusion term 
in the equation scale with oL [5]. With this approach, the high-enthalpy facilities 
are designed to reproduce the real flight velocities (Va) and to allow for an opera- 
tion considering the pL parameter as they can be represented in the graph of Fig. 9.5. 


This leads to some complications. Firstly, the same air mixture as in real flight is 
commonly used, as the chemistry processes are too complicated to reproduce to use 
another one. The typical dissociation energy D is therefore conserved, and the actual 
free-stream velocity must be reproduced to duplicate the group V7/2D. Secondly, the 
required density to achieve in the wind tunnel becomes large in order to maintain the 
proper value of the binary scaling parameter for duplication of flight at lower altitude. 
Thirdly, the binary scaling approach, strictly speaking, is built upon the hypothesis 
of a single species mixture. That approach has therefore limited application for more 
complex mixtures such as air [7]. Finally, as altitude decreases, density increases and 
both /p and Ig become smaller. The binary scaling parameter does not hold anymore, 
as both pL and p?L should be reproduced at the same time. The same holds when 
flow velocity increases, and therefore also temperature. This prevents from using 
the similarity laws, and flow duplication can thus only be performed on a full-scale 
model, with the actual flow velocity. 
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9.4 Duplication of Dissociated Boundary Layer 
with Surface Reaction 


Following the development presented in the previous paragraph, it is observed that 
in spite of all the possibilities offer by the ground testing, considering the high- 
temperatures effects in hypersonic, only the dissociation in the shock layer could 
be simulated to some extend on scaled models. If one wanted to further take into 
account, the recombination in the gas he would have to consider full-scale models. 
Looking into more details, one could underline that all the important phenomena, 
concerning the heat transfer typically, are laying in the boundary layer. Figure 9.6 
gives an illustration of real flight situation in front of an Aerospace vehicle. 

The study could then be reduced to this confined layer in particular for the heat- 
transfer problems in hypersonic. In this situation, the full-scale environment will 
be reproduced by duplicating all the characteristics of the boundary layer in the 
ground testing facilities. This statement was already made in earlier publications 
from researchers working on dissociated boundary layers using shock tube facilities 
[19]. 

The stagnation point is of particular interest for this duplication because the flow 
conditions, returning to zero velocity, are more easily reproduced in a laboratory. 
Before identifying the parameters that need to be retained for the testing conditions 
it will be useful to give a physical description of reacting boundary layers, as they 
manifest themselves along surfaces in hypersonic flows. Dissociated non-equilibrium 
boundary layers with surface reaction have already been presented extensively by 
major authors [2, 19] and the reader is encouraged to consult these references. The 
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Fig. 9.6 Re-entry environment along stagnation line 
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purpose of this section is not to expose in details the theory specific to such boundary 
layers but to recall their main characteristics and better understand how they could be 
simulated in ground-based facilities. Boundary layers are the location where the diffu- 
sion phenomena are dominant, their relative importance are characterized by typical 
non-dimensional numbers: Prandtl (Pr), Lewis (Le), and Schmidt (Sc). Those will 
not be commented here to rather focus on the chemical non-equilibrium feature of the 
flow. The chemical reactions taking place in the gas phase are called homogeneous 
reactions while those happening between the gas and the solid surface are called 
heterogeneous reactions. The chemical non-equilibrium in the gas phase is charac- 
terized by the gas Damkoéhler number (Dag). It corresponds to the ratio between 
the typical time of the flow, to cross the boundary layer, to a typical reaction time 
for the gas chemistry: Da, = Tf /Te. When Dag — oo the boundary layer reaches 
a local thermodynamic equilibrium (LTE). On the contrary when Da, — 0 it leads 
to a frozen boundary layer, where no chemistry happen. These different conditions 
have significant consequences on the wall heat flux as it has been shown in reference 
[8]. The reactions at the wall are usually considered as first-order reactions, they are 
represented by a reaction rate (kw) for each dissociated specie. ką could be related to 
a recombination coefficient y; (or catalycity parameter), interpreted as a probability 
of recombination at the wall, by the Hertz—Knudsen formula: 


k-T, 
2x - Mi 


Kyi = Vi (9.5) 


The wall reaction rates are also characterized by a surface Damkohler number 
(Da,,). It compares the time of diffusion for the species across the boundary layer to 
the time of reaction at the wall: Daw = tpir¢/Treact- When Day — 00, it does not 
necessarily imply that the reaction rate at the wall tend to infinity (kw — oo), but 
simply that the surface reaction are much more faster than the diffusion process. It 
is said that the GSI phenomena are “limited by diffusion’. In the other extreme case, 
when Da,, — 0 the diffusion is much faster than the reaction and the GSI phenomena 
are “limited by reaction” or “reaction controlled”. From this description, it appears 
that the diffusion and reaction processes should be accurately reproduced. To this 
extend, it could be understood that the dimension and the environment of the reaction 
boundary layer must be duplicated (Fig. 9.7). Two situations could be distinguished 
for the boundary layer to be duplicated in the laboratory: stagnation point region and 
off-stagnation point when the boundary is developing along the surface. 

If one considers only the stagnation point region, it has been shown that a complete 
duplication of real flight condition is possible in ground facility, if the total enthalpy 
(He), the total pressure (Pe), and the velocity gradient (6 = du/dx), of the flight 
conditions, can be matched locally on the test sample [4, 14]. In this case, the testing 
is realized in plasma wind tunnels (Arcjet or Plasmatron facilities) that are able to 
produce dissociated flows for a long time base which is suitable for tests involving 
aerothermochemistry. The theoretical frame for the testing with ICP wind tunnels 
has been adapted by Russian scientists, in a methodology called Local Heat Transfer 
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Fig. 9.7 Typical features of the dissociated stagnation point boundary layer 


Simulation (LHTS) [14] and an assessment of this methodology has been conducted 
at the VKI Plasmatron facility [1, 4]. The duplication of the flight condition at 
stagnation point is strictly reduced to the boundary layer with its appropriate treatment 
[1]. In the case of a subsonic plasma facility, like the VKI Plasmatron, the testing 
configuration could be presented as in Fig. 9.8. 

The experimental assessment of the LHTS methodology has been conducted at 
VKI by Chazot et al. [4] and Barbante and Chazot [1]. The results are presented 
on the Figs. 9.9 and 9.10. It could be seen that the boundary layers profiles from 
the hypersonic flow is very well duplicated with the subsonic plasma flow, when 
matching the characteristics parameters at the edge of the boundary layer. 
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Fig. 9.8 TPS testing in plasma wind tunnel in LHTS conditions 
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Fig. 9.9 Temperature 
profiles comparison between 
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conditions [1] 
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9.5 Considering Flow Radiation 


Distance from the wall, m 


Shock layer radiative heating appears around 9 km/s in Earth’s atmosphere and 7 
km/s in Mars’ atmosphere [17]. It reaches 10 % of the total heat flux for probes having 
a diameter smaller than | m and entry velocities approaching 13 km/s in the Earth’s 
atmosphere [21]. It is then a physical phenomena to be considered for super-orbital 


re-entry. 
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However on ground testing for scaled models flow radiation is not often addressed 
in the literature. It is usually mentioned to explain that this feature of the flow cannot 
be properly reproduced considering similarity rules for a model of reduced dimension 
[18]. But even if this conclusion is fully valid some more details could be added to 
better understand the issues with radiative heat-transfer in hypersonic facilities. 

The radiative flux at a certain point P is the integral of the radiative intensity 
in all directions and over the entire frequency spectrum. Considering a point S on 
a surface, the radiative flux reaching S from a point P of the surrounding is the 
difference between the energy emitted and that absorbed integrated along the optical 
path from P to S. In the case of a scaled model in a high-enthalpy facility, respecting 
the same condition for the gas and the flow velocity, it could be shown that the optical 
thickness in the radiating shock layer scales with the product pL, considering the 
absorption coefficient remaining the same on the two situations. 

One is then brought back to the same approach as the binary scaling exposed 
before. In these conditions, the radiative heat-flux on the surface of a scaled model 
could be reproduced in a ground-based facility. 

However, if one considers the flow passing through the radiating environment 
around the model, it appears that the scaling does not hold any longer. The amount 
of energy E radiated by a control volume is proportional to the mass contained in that 
volume: E «x m = p.L?. The amount of flow m ingested in the shock layer could be 
expressed as ñ X p.U>..L*. Therefore, when the flow-radiation coupling need to be 
taken into account the heat radiated per unit mass passing in a control volume scales 
as: E/m x L. In conclusion, even if the radiative flux on the surface of a scaled 
model could be reproduced, the radiative heating of the flow around the same model 
is not respected. 

This problem arises if the radiative heating is important enough to have an influ- 
ence on the rest of the flowfield. This coupling is quantified by the radiative cooling 
parameter I, also referred to as Goulard number, defined as 


r= 2: Qa (9.6) 
1/2 poo U3, 

where Q74 is the radiative heating for an adiabatic flow, that is without radiative 
cooling, pə the free-stream density, and v the shock velocity. This parameter serves 
as an approximate measure of the coupling between radiation and the flow [10]. It is 
the ratio of the amount of radiation generated by the shock, assumed to be twice the 
radiative heating of the surface, by the kinetic energy heat flux entering the shock 
layer. If Fr > 0.01, radiation is coupled to the rest of the flowfields and the effect of 
improper radiation scaling extends to the rest of the flowfield. 

Furthermore, there is a coupling between radiation and gas—surface interaction 
processes. Indeed, the use of ablative material is compulsory for high-speed re- 
entry. The ablation processes are very efficient to prevent the hot shock layer gas 
from reaching the wall and absorb part of the shock layer radiative heat flux. An 
accurate estimation of the absorbed radiation is complex since the thickness and 
thermochemical state of the ablation gas layer are difficult to predict up to now [17]. 
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Other surface phenomena such as catalycity and oxidation should be also taken into 
account, it makes the problem even more intricate when one is aware that such a 
models are not yet fully established, especially in high-enthalpy flows [3]. 


9.6 Ground Testing Strategy for High-Speed Re-entry 


Ground testing facilities commonly used to study re-entry flows and that are mainly 
concerned with high-enthalpy flows can be divided in two categories: 


e Impulse facilities, such as shock tubes and ballistic ranges, are only able to produce 
flows that last typically a fraction of a second. It is usually assumed long enough 
to let the steady flow establish itself, but too short compared to the thermal inertia 
of the material surface. They are thus mainly used to investigate the aerothermo- 
dynamic effects, gas kinetic, and radiation processes. In this category, expansion 
tubes are able to reach free-stream enthalpies characteristic of high-speed re-entry. 

e Plasma wind tunnels, such as inductively coupled plasma facilities or arc-jets, are 
able to operate for long test durations, in the order of minutes. However, they have 
not been designed to reproduce the flow radiation. 


Similar flights conditions or direct flight duplication are possible to reproduce for 
sub-orbital re-entry velocity in those facilities for a limited time (impulse facilities) 
or in a limited region (stagnation point in plasma wind tunnels). Each category of 
facility is addressing a specific aspect of the flowfield as it is sketched in Fig. 9.11. 

The velocity of the flow is much higher in the case of high-speed re-entry which 
concern super-orbital conditions. This results in considerably higher free-stream 
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Fig. 9.11 Different types of facilities for flow radiation or gas—surface interaction studies 
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enthalpy and pressure in the flow, and leads to coupling phenomena that cannot be 
reproduced on scaled down models. 

High-speed re-entry requires therefore a new approach of ground testing. Ground 
testing facilities should be used to investigate separately different phenomenon play- 
ing a role in the aerothermodynamic of the flow, rather than to duplicate flight, as it 
is the case for lower velocity re-entry. 

Those investigations should be performed on a panel of different facilities in order 
to develop models and databases. Models of shock layer radiation can be developed 
based on measurements performed in impulse facilities, under conditions similar 
to those encountered in high-speed re-entry. However, material processes such as 
ablation and radiative heating cannot be performed in the same facility due to the 
short test duration involved. In particular, gas—surface interaction have to be studied 
in plasma wind tunnels. As it is known, those facilities could produce the required 
heat flux level, but have not been designed to take into account the correct coupling 
phenomena including the radiation processes present below a shock layer. Models 
of gas—surface interaction have therefore to be developed under different conditions 
than that of high-speed re-entry. 

Those models, developed separately, should then be implemented in Computa- 
tional Fluid Dynamics (CFD) codes and allow extrapolatation to the actual flight con- 
ditions. Since they cannot be validated within the flight conditions envelope unless 
flight-testing is performed, they need to capture the main physical phenomena and 
their accuracy is of prime importance. This, in turn, allows sizing and designing 
re-entry probes as well as assessing their performance in flight, with computational 
methods rather than direct ground testing facilities. The general framework of the 
testing methodology for high-speed re-entry is summarized in Fig. 9.12. 


146 O. Chazot 


9.7 Conclusion 


The reproduction of high-speed re-entry conditions in ground-based facilities is a 
real challenge. Because of the coupling phenomena that characterized this type of 
flows their accurate experimental simulation on scaled-down models is impossible in 
ground-based facilities. It appears as well that only little research has been conducted 
on dedicated strategies for ground testing of high-speed re-entry flows. Most of the 
time, testing is limited to qualification tests that reproduce the heat flux level without 
taking into account all the physical phenomena involved. 

The two main limitations concern radiation and gas—surface interactions: both 
cannot be correctly reproduced at the same time in a single facility. Indeed, the time- 
scale achieved in impulse facilities is shorter than those relevant for gas—surface 
interactions, while the correct radiation phenomena are difficult to reproduce in 
plasma wind tunnels. 

A solution is to develop models for radiation and gas—surface interaction in the 
relevant facilities, under controlled environments. Since the conditions in which 
those models can be validated in ground facilities are different from the one encoun- 
tered in high-speed flows, they specifically require to be physic based in view of their 
extrapolation to flight conditions. In such a context, model validation and their incor- 
poration in CFD codes are crucial for the development of aerospace applications. UQ 
methodologies are expected to play a major role for this approach as they represent 
the unique way to give solid basis to the validation process. In order to manifest all 
their benefit and correctly address the problem, UQ methods imperatively need to be 
articulated with the experimental procedure. To this end, this brief review exposes 
the rationale of experimental testing and how it is linked to the physics of aerospace 
flights. It would serve as a basis for the development of Uncertainty Quantification 
apply to the validation of high-speed flow modeling. 
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